* Looking for the cause of poor I/O performance
@ 2004-12-02 16:38 TJ
2004-12-03 0:49 ` Mark Hahn
2004-12-03 7:12 ` TJ
0 siblings, 2 replies; 36+ messages in thread
From: TJ @ 2004-12-02 16:38 UTC (permalink / raw)
To: linux-raid
Hi,
I'm getting horrible performance on my samba server, and I am
unsure of the cause after reading, benchmarking, and tuning.
My server is a K6-500 with 43MB of RAM, standard x86 hardware. The
OS is Slackware 10.0 w/ 2.6.7 kernel I've had similar problems with the
2.4.26 kernel. I've listed my partitions below, as well as the drive models.
I have a linear RAID array as a single element of a RAID 5 array. The RAID 5
array is the array containing the fs being served by samba. I'm sure having
one raid array built on another affects my I/O performance, as well as having
root, swap, and a slice of that array all on one drive, however, I have taken
this into account and still am unable to account for my machine's poor
performance. All drives are on their own IDE channel, no master slave combos,
as suggested in the RAID howto.
To tune these drives, I use:
hdparm -c3 -d1 -m16 -X68 -k1 -A1 -a128 -M128 -u1 /dev/hd[kigca]
I have tried different values for -a. I use 128, because this corresponds
closely with the 64k stripe of the raid 5 array. I ran hdparm -Tt on each
individual drive as well as both of the raid arrays and included these
numbers below. The numbers I got were pretty low for modern drives.
In my dmesg, I'm seeing something strange.. I think this is determined by
kernel internals. It seems strange and problematic to me. I believe this
number is controller dependant, so I'm wondering if I have a controller issue
here...
hda: max request size: 128KiB
hdc: max request size: 1024KiB
hdg: max request size: 64KiB
hdi: max request size: 128KiB
hdk: max request size: 1024KiB
I believe my hard drives are somehow not tuned properly due to the low hdparm
numbers, especially hda and hdc. This is causing the raid array to perform
poorly, in dbench and hdparm -tT. The fact that two drives on the same IDE
controller are performing worse than the group, hda and hdc, further indicate
that there may be a controller problem. I may try eliminating this controller
and checking the results again.
Also, I know that VIA chipsets, such as this MVP3, are known for poor PCI
performance. I know that this is tweakable, and several programs exist for
tweaking BIOS registers within Windows. How might I test the PCI bus to see
if it is causing performance problems?
Does anyone have any ideas on how to better tune these drives for more
throughput?
My partitions are:
/dev/hda1 on /
/dev/hda2 is swap
/dev/hda3 is part of /dev/md0
/dev/hdi is part of /dev/md0
/dev/hdk is part of /dev/md0
/dev/md0 is a linear array. It is part of /dev/md1
/dev/hdg is part of /dev/md1
/dev/hdc is part of /dev/md1
/dev/md1 is a raid 5 array.
hda: WD 400JB 40GB
hdc: WD 2000JB 200GB
hdg: WD 2000JB 200GB
hdi: IBM 75 GXP 120GB
hdk: WD 1200JB 120GB
Controllers:
hda-c: Onboard controller, VIA VT82C596B (rev 12)
hdd-g: Silicon Image SiI 680 (rev 1)
hdh-k: Promise PDC 20269 (rev 2)
The results from hdparm -tT for each individual drive and each raid array
are:
/dev/hda:
Timing buffer-cache reads: 212 MB in 2.02 seconds = 105.17 MB/sec
Timing buffered disk reads: 42 MB in 3.07 seconds = 13.67 MB/sec
/dev/hdc:
Timing buffer-cache reads: 212 MB in 2.00 seconds = 105.80 MB/sec
Timing buffered disk reads: 44 MB in 3.12 seconds = 14.10 MB/sec
/dev/hdg:
Timing buffer-cache reads: 212 MB in 2.02 seconds = 105.12 MB/sec
Timing buffered disk reads: 68 MB in 3.04 seconds = 22.38 MB/sec
/dev/hdi:
Timing buffer-cache reads: 216 MB in 2.04 seconds = 106.05 MB/sec
Timing buffered disk reads: 72 MB in 3.06 seconds = 23.53 MB/sec
/dev/hdk:
Timing buffer-cache reads: 212 MB in 2.01 seconds = 105.33 MB/sec
Timing buffered disk reads: 66 MB in 3.05 seconds = 21.66 MB/sec
/dev/md0:
Timing buffer-cache reads: 212 MB in 2.01 seconds = 105.28 MB/sec
Timing buffered disk reads: 70 MB in 3.07 seconds = 22.77 MB/sec
/dev/md1:
Timing buffer-cache reads: 212 MB in 2.03 seconds = 104.35 MB/sec
Timing buffered disk reads: 50 MB in 3.03 seconds = 16.51 MB/sec
The results from dbench 1 are: Throughput 19.0968 MB/sec 1 procs
The results from tbench 1 are: Throughput 4.41996 MB/sec 1 procs
I would appriciate any thoughts, leads, ideas, anything at all to point me in
a direction here.
Thanks,
TJ Harrell
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-02 16:38 TJ
@ 2004-12-03 0:49 ` Mark Hahn
2004-12-03 3:54 ` Guy
` (2 more replies)
2004-12-03 7:12 ` TJ
1 sibling, 3 replies; 36+ messages in thread
From: Mark Hahn @ 2004-12-03 0:49 UTC (permalink / raw)
To: TJ; +Cc: linux-raid
> My server is a K6-500 with 43MB of RAM, standard x86 hardware. The
such a machine was good in its day, but that day was what, 5-7 years ago?
in practical terms, the machine probably has about 300 MB/s of memory
bandwidth (vs 3000 for a low-end server today). further, it was not uncommon
for chipsets to fail to cache then-large amounts of RAM (32M was a common
limit for caches configured writeback, for instance, that would magically
cache 64M if set to writethrough...)
> OS is Slackware 10.0 w/ 2.6.7 kernel I've had similar problems with the
with a modern kernel, manual hdparm tuning is unnecessary and probably wrong.
> To tune these drives, I use:
> hdparm -c3 -d1 -m16 -X68 -k1 -A1 -a128 -M128 -u1 /dev/hd[kigca]
if you don't mess with the config via hdparm, what mode do they come up in?
> hda: WD 400JB 40GB
> hdc: WD 2000JB 200GB
> hdg: WD 2000JB 200GB
> hdi: IBM 75 GXP 120GB
> hdk: WD 1200JB 120GB
iirc, the 75GXP has a noticably lower density (and thus bandwidth).
> Controllers:
> hda-c: Onboard controller, VIA VT82C596B (rev 12)
> hdd-g: Silicon Image SiI 680 (rev 1)
> hdh-k: Promise PDC 20269 (rev 2)
> /dev/hda: Timing buffered disk reads: 42 MB in 3.07 seconds = 13.67 MB/sec
> /dev/hdc: Timing buffered disk reads: 44 MB in 3.12 seconds = 14.10 MB/sec
not that bad for such a horrible controller (and PCI, CPU, memory system)
> /dev/hdg: Timing buffered disk reads: 68 MB in 3.04 seconds = 22.38 MB/sec
> /dev/hdi: Timing buffered disk reads: 72 MB in 3.06 seconds = 23.53 MB/sec
> /dev/hdk: Timing buffered disk reads: 66 MB in 3.05 seconds = 21.66 MB/sec
fairly modern controllers help, but not much.
> /dev/md0: Timing buffered disk reads: 70 MB in 3.07 seconds = 22.77 MB/sec
> /dev/md1: Timing buffered disk reads: 50 MB in 3.03 seconds = 16.51 MB/sec
since the cpu/mem/chipset/bus are limiting factors, raid doesn't help.
> I would appriciate any thoughts, leads, ideas, anything at all to point me in
> a direction here.
keeping a K6 alive is noble and/or amusing, but it's just not reasonable to
expect it to keep up with modern disks. expecting it to run samba well is
not terribly reasonable.
plug those disks into any entry-level machine bought new (celeron, sempron)
and you'll get whiplash. plug those disks into a proper server
(dual-opteron, few GB ram) and you'll never look back. in fact,
you'll start looking for a faster network.
regards, mark hahn.
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance
2004-12-03 0:49 ` Mark Hahn
@ 2004-12-03 3:54 ` Guy
2004-12-03 6:33 ` TJ
2004-12-04 15:23 ` TJ
2004-12-03 6:51 ` TJ
2004-12-03 20:03 ` TJ
2 siblings, 2 replies; 36+ messages in thread
From: Guy @ 2004-12-03 3:54 UTC (permalink / raw)
To: 'Mark Hahn', 'TJ'; +Cc: linux-raid
My linux system is a P3-500 with 2 CPUs and 512 Meg RAM. My system is much
faster than my network. I don't know how your K6-500 compares to my P3-500.
But RAM may be your issue. That amount of ram seems very low. Are you
swapping? What is your CPU load during the tests? If you are at 100%, then
you are CPU bound.
Your disk performance is faster than a 100BaseT network. So, your
performance may not be an issue.
My array gives about 60MB /second.
# hdparm -tT /dev/md2
/dev/md2:
Timing buffer-cache reads: 128 MB in 0.87 seconds =147.13 MB/sec
Timing buffered disk reads: 64 MB in 0.99 seconds = 64.65 MB/sec
# bonnie++ -d . -u 0:0
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
watkins-home 1G 3414 99 30899 66 20449 46 3599 99 77781 74 438.7 9
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 475 98 +++++ +++ 15634 88 501 99 1277 99 1977 98
Guy
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mark Hahn
Sent: Thursday, December 02, 2004 7:50 PM
To: TJ
Cc: linux-raid@vger.kernel.org
Subject: Re: Looking for the cause of poor I/O performance
> My server is a K6-500 with 43MB of RAM, standard x86 hardware. The
such a machine was good in its day, but that day was what, 5-7 years ago?
in practical terms, the machine probably has about 300 MB/s of memory
bandwidth (vs 3000 for a low-end server today). further, it was not
uncommon
for chipsets to fail to cache then-large amounts of RAM (32M was a common
limit for caches configured writeback, for instance, that would magically
cache 64M if set to writethrough...)
> OS is Slackware 10.0 w/ 2.6.7 kernel I've had similar problems with the
with a modern kernel, manual hdparm tuning is unnecessary and probably
wrong.
> To tune these drives, I use:
> hdparm -c3 -d1 -m16 -X68 -k1 -A1 -a128 -M128 -u1 /dev/hd[kigca]
if you don't mess with the config via hdparm, what mode do they come up in?
> hda: WD 400JB 40GB
> hdc: WD 2000JB 200GB
> hdg: WD 2000JB 200GB
> hdi: IBM 75 GXP 120GB
> hdk: WD 1200JB 120GB
iirc, the 75GXP has a noticably lower density (and thus bandwidth).
> Controllers:
> hda-c: Onboard controller, VIA VT82C596B (rev 12)
> hdd-g: Silicon Image SiI 680 (rev 1)
> hdh-k: Promise PDC 20269 (rev 2)
> /dev/hda: Timing buffered disk reads: 42 MB in 3.07 seconds = 13.67
MB/sec
> /dev/hdc: Timing buffered disk reads: 44 MB in 3.12 seconds = 14.10
MB/sec
not that bad for such a horrible controller (and PCI, CPU, memory system)
> /dev/hdg: Timing buffered disk reads: 68 MB in 3.04 seconds = 22.38
MB/sec
> /dev/hdi: Timing buffered disk reads: 72 MB in 3.06 seconds = 23.53
MB/sec
> /dev/hdk: Timing buffered disk reads: 66 MB in 3.05 seconds = 21.66
MB/sec
fairly modern controllers help, but not much.
> /dev/md0: Timing buffered disk reads: 70 MB in 3.07 seconds = 22.77
MB/sec
> /dev/md1: Timing buffered disk reads: 50 MB in 3.03 seconds = 16.51
MB/sec
since the cpu/mem/chipset/bus are limiting factors, raid doesn't help.
> I would appriciate any thoughts, leads, ideas, anything at all to point me
in
> a direction here.
keeping a K6 alive is noble and/or amusing, but it's just not reasonable to
expect it to keep up with modern disks. expecting it to run samba well is
not terribly reasonable.
plug those disks into any entry-level machine bought new (celeron, sempron)
and you'll get whiplash. plug those disks into a proper server
(dual-opteron, few GB ram) and you'll never look back. in fact,
you'll start looking for a faster network.
regards, mark hahn.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-03 3:54 ` Guy
@ 2004-12-03 6:33 ` TJ
2004-12-03 7:38 ` Guy
2004-12-04 15:23 ` TJ
1 sibling, 1 reply; 36+ messages in thread
From: TJ @ 2004-12-03 6:33 UTC (permalink / raw)
To: linux-raid; +Cc: Guy
> My linux system is a P3-500 with 2 CPUs and 512 Meg RAM. My system is much
> faster than my network. I don't know how your K6-500 compares to my
> P3-500. But RAM may be your issue. That amount of ram seems very low. Are
> you swapping? What is your CPU load during the tests? If you are at 100%,
> then you are CPU bound.
You've got a dual CPU setup, mine is only single. I'll bet you have a server
chipset too. Still, I have serious doubts that the CPU is at fault. My guess
would be that this could be a VIA chipset problem. The load averages while
running these tests are allways well below 1.
I mistyped the amount of memory. I have approx 409 MB. I am not swapping.
> Your disk performance is faster than a 100BaseT network. So, your
> performance may not be an issue.
The network is gigabit, with a crossover to the client machine. I used ttcp to
verify that the link is capable of over 146 MB/sec.
TJ
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-03 0:49 ` Mark Hahn
2004-12-03 3:54 ` Guy
@ 2004-12-03 6:51 ` TJ
2004-12-03 20:03 ` TJ
2 siblings, 0 replies; 36+ messages in thread
From: TJ @ 2004-12-03 6:51 UTC (permalink / raw)
To: linux-raid; +Cc: Mark Hahn
> such a machine was good in its day, but that day was what, 5-7 years ago?
> in practical terms, the machine probably has about 300 MB/s of memory
> bandwidth (vs 3000 for a low-end server today). further, it was not
> uncommon for chipsets to fail to cache then-large amounts of RAM (32M was a
> common limit for caches configured writeback, for instance, that would
> magically cache 64M if set to writethrough...)
You are clearly right that the memory bandwidth is lower than a modern
machine. However, I do feel that the disk I/O still should be much better
with this limitation. Doing a straight read operation as hdparm -tT does, the
300 MB/s memory bandwidth should allow for better performance than this.
Guy's numbers with an oldish P3 box validate me.
Additionally, unless you're talking about a box with 64 bit PCI, PCI-X, or PCI
Express, the PCI bus is going to be a severely limiting factor compared to
the memory bus. While the box could do more memory I/O, a disk-bound read
operation should be limited by the PCI bandwidth on either a new machine, or
this machine.
> with a modern kernel, manual hdparm tuning is unnecessary and probably
> wrong.
I understand why setting dma and the like is probably unnecessary. For RAID
arrays, I would think that setting up readaheads, and sound management levels
with hdparm, and setting kernel readahead parameters in the fs settings would
be advantageous.
> > To tune these drives, I use:
> > hdparm -c3 -d1 -m16 -X68 -k1 -A1 -a128 -M128 -u1 /dev/hd[kigca]
>
> if you don't mess with the config via hdparm, what mode do they come up in?
> iirc, the 75GXP has a noticably lower density (and thus bandwidth).
Granted, so why on earth would it perform similarly with hdparm -tT? Even more
confusing, how could it best the newer WD 400JB?
> > /dev/hda: Timing buffered disk reads: 42 MB in 3.07 seconds = 13.67
> > MB/sec /dev/hdc: Timing buffered disk reads: 44 MB in 3.12 seconds =
> > 14.10 MB/sec
>
> not that bad for such a horrible controller (and PCI, CPU, memory system)
So you do think that the VIA controller is inferior?
> > /dev/md0: Timing buffered disk reads: 70 MB in 3.07 seconds = 22.77
> > MB/sec /dev/md1: Timing buffered disk reads: 50 MB in 3.03 seconds =
> > 16.51 MB/sec
>
> since the cpu/mem/chipset/bus are limiting factors, raid doesn't help.
Those low raid numbers do seem to suggest that, wouldn't it..
> keeping a K6 alive is noble and/or amusing, but it's just not reasonable to
> expect it to keep up with modern disks. expecting it to run samba well is
> not terribly reasonable.
>
> plug those disks into any entry-level machine bought new (celeron, sempron)
> and you'll get whiplash. plug those disks into a proper server
> (dual-opteron, few GB ram) and you'll never look back. in fact,
> you'll start looking for a faster network.
I disagree, but I must admit that this is a possibility. My desktop machine is
an Athlon XP 1700+, 512 MB ram, running at 266MHz DDR bus. It's a class over
the K6 easily, with a much better memory subsystem. I could dump all the
drives and controllers onto it and run the same tests using the same kernel
and everything and record the numbers. Do you feel this would prove or
disprove the idea that the box is just underpowered?
TJ Harrell
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-02 16:38 TJ
2004-12-03 0:49 ` Mark Hahn
@ 2004-12-03 7:12 ` TJ
1 sibling, 0 replies; 36+ messages in thread
From: TJ @ 2004-12-03 7:12 UTC (permalink / raw)
To: linux-raid
Here's some info on the VIA chipset problem..
Here's the article that really cracked the VIA chipset problem. It seems there
are problems with the default timings and latency settings:
http://www.tecchannel.de/hardware/817/8.html
This guy, independant of VIA, came up with some chipset register tweaks to
improve performance of the PCI bus for the broken VIAs.
http://www.georgebreese.com/net/software/
http://www.georgebreese.com/net/software/readmes/vlatency_v020b21_readme.HTM
http://adsl.cutw.net/dlink-dsl200-via.html
The linux kernel sources indicate a problem with VIA chipsets too, but the
author doesn't have it fixed to his satisfaction, obviously.
From linux-2.6.7/drivers/pci/quirks.c:
/* The VIA VP2/VP3/MVP3 seem to have some 'features'. There may be a
workaround
but VIA don't answer queries. If you happen to have good contacts at VIA
ask them for me please -- Alan
This appears to be BIOS not version dependent. So presumably there is a
chipset level fix */
I'm wondering if I could get the tweaks that George Breese has made and
implement them in linux. If I could, how would I benchmark to check for
improvement? Bonnie? hdparm -tT? Some PCI benchmark?
TJ Harrell
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance
2004-12-03 6:33 ` TJ
@ 2004-12-03 7:38 ` Guy
0 siblings, 0 replies; 36+ messages in thread
From: Guy @ 2004-12-03 7:38 UTC (permalink / raw)
To: 'TJ', linux-raid
Gigabit! Lucky :)
I want a Gigabit switch for Christmas! And a few PCI cards too!
Of course, with Gigabit, I would want/need a better Linux system too! With
PCI-express and ... I better wake up and go to bed! :)
Guy
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of TJ
Sent: Friday, December 03, 2004 1:33 AM
To: linux-raid@vger.kernel.org
Cc: Guy
Subject: Re: Looking for the cause of poor I/O performance
> My linux system is a P3-500 with 2 CPUs and 512 Meg RAM. My system is
much
> faster than my network. I don't know how your K6-500 compares to my
> P3-500. But RAM may be your issue. That amount of ram seems very low.
Are
> you swapping? What is your CPU load during the tests? If you are at
100%,
> then you are CPU bound.
You've got a dual CPU setup, mine is only single. I'll bet you have a server
chipset too. Still, I have serious doubts that the CPU is at fault. My guess
would be that this could be a VIA chipset problem. The load averages while
running these tests are allways well below 1.
I mistyped the amount of memory. I have approx 409 MB. I am not swapping.
> Your disk performance is faster than a 100BaseT network. So, your
> performance may not be an issue.
The network is gigabit, with a crossover to the client machine. I used ttcp
to
verify that the link is capable of over 146 MB/sec.
TJ
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance
@ 2004-12-03 11:30 TJ
2004-12-03 11:46 ` Erik Mouw
0 siblings, 1 reply; 36+ messages in thread
From: TJ @ 2004-12-03 11:30 UTC (permalink / raw)
To: linux-raid
>Gigabit! Lucky :)
>
>I want a Gigabit switch for Christmas! And a few PCI cards too!
>
>Of course, with Gigabit, I would want/need a better Linux system too! With
>PCI-express and ... I better wake up and go to bed! :)
Bwahahaha!
I'm cheap. I use a crossover so I didn't have to spring for the switch. The
NICs are Intel 82540EM's. I got them for around $55 per. I didn't think that
was too bad for gigabit. Of course, these controllers may be complete trash,
I dunno.
TJ Harrell
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-03 11:30 Looking for the cause of poor I/O performance TJ
@ 2004-12-03 11:46 ` Erik Mouw
2004-12-03 15:09 ` TJ
2004-12-03 16:32 ` David Greaves
0 siblings, 2 replies; 36+ messages in thread
From: Erik Mouw @ 2004-12-03 11:46 UTC (permalink / raw)
To: TJ; +Cc: linux-raid
On Fri, Dec 03, 2004 at 06:30:51AM -0500, TJ wrote:
> I'm cheap. I use a crossover so I didn't have to spring for the switch. The
> NICs are Intel 82540EM's. I got them for around $55 per. I didn't think that
> was too bad for gigabit. Of course, these controllers may be complete trash,
> I dunno.
You won't do any better than fast ethernet when you're using a
crossover cable. Gigabit ethernet doesn't need crossover cables for
direct connections, it uses all four wire pairs in cat5 cable and will
automatically figure out if there's a direct connection and do the
right thing (all mandatory by the gigE standard, so every NIC will
support it). If you use a fast ethernet cross cable, the NICs will
autonegotiate to 100 MB/s full-duplex.
The Intel gigE NICs are very good: good hardware, good driver, good
support. Gigabit ethernet switches are becoming rather cheap: 200 EUR
buys you an 8 port switch.
Erik
--
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-03 11:46 ` Erik Mouw
@ 2004-12-03 15:09 ` TJ
2004-12-03 16:25 ` Erik Mouw
2004-12-03 16:32 ` David Greaves
1 sibling, 1 reply; 36+ messages in thread
From: TJ @ 2004-12-03 15:09 UTC (permalink / raw)
To: linux-raid; +Cc: Erik Mouw
> You won't do any better than fast ethernet when you're using a
> crossover cable. Gigabit ethernet doesn't need crossover cables for
> direct connections, it uses all four wire pairs in cat5 cable and will
> automatically figure out if there's a direct connection and do the
> right thing (all mandatory by the gigE standard, so every NIC will
> support it). If you use a fast ethernet cross cable, the NICs will
> autonegotiate to 100 MB/s full-duplex.
I did not know that auto-sensing was part of the Gigabit standard. I don't
understand why you would think that performance would be worse with a
crossover than a straight cable, though. I assure you, the link
autonegotiates to a gigabit connection. The card driver reports this, the
card's light indicator reports this, and my benchmarking of throughput has
proven it.
> The Intel gigE NICs are very good: good hardware, good driver, good
> support. Gigabit ethernet switches are becoming rather cheap: 200 EUR
> buys you an 8 port switch.
Yeah, I knew Intel made good NIC's, and I knew they were linux supported. I'm
only worried because this is the lowest end model in the line. I wonder if it
offloads work to the CPU, causing lower throughput on a busy link, while more
expensive versions handle more work on the card. Also, I have read some
traffic that the e1000 driver is better tuned for light duty connections, and
could use some improvement under a heavy workload. If you knew about any
documentation, or mailing lists on the topic of tuning this, I'd appreciate
it.
TJ Harrell
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-03 15:09 ` TJ
@ 2004-12-03 16:25 ` Erik Mouw
0 siblings, 0 replies; 36+ messages in thread
From: Erik Mouw @ 2004-12-03 16:25 UTC (permalink / raw)
To: TJ; +Cc: linux-raid
On Fri, Dec 03, 2004 at 10:09:23AM -0500, TJ wrote:
> I did not know that auto-sensing was part of the Gigabit standard. I don't
> understand why you would think that performance would be worse with a
> crossover than a straight cable, though. I assure you, the link
> autonegotiates to a gigabit connection. The card driver reports this, the
> card's light indicator reports this, and my benchmarking of throughput has
> proven it.
That means you have a crossover cable with two wire pairs crossed and
two wire pairs straight, and guess what: gigE automatically detects
badly wired cables (to a certain extent), correct it and negotiate to
the correct speed: 1 Gbit/s. If you have a crossover cable using only
two crossed wire pairs and the other pairs not connected, the link will
negotiate to 100 Mbit/s.
> > The Intel gigE NICs are very good: good hardware, good driver, good
> > support. Gigabit ethernet switches are becoming rather cheap: 200 EUR
> > buys you an 8 port switch.
>
> Yeah, I knew Intel made good NIC's, and I knew they were linux supported. I'm
> only worried because this is the lowest end model in the line. I wonder if it
> offloads work to the CPU, causing lower throughput on a busy link, while more
> expensive versions handle more work on the card.
We use the dual ported PCI-X server adapters in the file servers (dual
Athlon and dual Opteron), but to be honest I haven't seen a difference
in performance with the desktop adapters when we replaced them. It's
just that they're 64 bit wide and have two NICs on a single board (and
hence only use one PCI slot). The other machines (about 10 or so) have
the cheaper desktop adapters.
> Also, I have read some traffic that the e1000 driver is better tuned
> for light duty connections, and could use some improvement under a
> heavy workload. If you knew about any documentation, or mailing lists
> on the topic of tuning this, I'd appreciate it.
I can't comment on that. We push several gigabytes/day through the
cards and I haven't seen any real problems. We had performance problems
with NatSemi gigE NICs; Broadcom gigE NICs looks like too much driver
hassle to me (judging from posts on linux-kernel).
Documentation can be found on http://sourceforge.net/projects/e1000 ,
the appropriate mailing list is the networking list: netdev@oss.sgi.com .
Erik
--
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
| Data lost? Stay calm and contact Harddisk-recovery.com
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-03 11:46 ` Erik Mouw
2004-12-03 15:09 ` TJ
@ 2004-12-03 16:32 ` David Greaves
2004-12-03 16:50 ` Guy
1 sibling, 1 reply; 36+ messages in thread
From: David Greaves @ 2004-12-03 16:32 UTC (permalink / raw)
To: Erik Mouw; +Cc: TJ, linux-raid
I paid about £50 for a 5port gig switch
I have 3 e1000 cards (about £30 each) - they're relegated to doorstops
I'm afraid :(
Despite months of trying they just won't work with my consumer VIA/AMD
systems (and Ganesh and gang have tried)
I'm now using even cheaper Marvell based SMC EZ1000s (£20ish) - I doubt
I'll get close to the throughput the e1000s could achieve - but I get 3
times more then fast ethernet (and about 10 times more than e1000s)
which is worthwhile.
David
Erik Mouw wrote:
>On Fri, Dec 03, 2004 at 06:30:51AM -0500, TJ wrote:
>
>
>>I'm cheap. I use a crossover so I didn't have to spring for the switch. The
>>NICs are Intel 82540EM's. I got them for around $55 per. I didn't think that
>>was too bad for gigabit. Of course, these controllers may be complete trash,
>>I dunno.
>>
>>
>
>You won't do any better than fast ethernet when you're using a
>crossover cable. Gigabit ethernet doesn't need crossover cables for
>direct connections, it uses all four wire pairs in cat5 cable and will
>automatically figure out if there's a direct connection and do the
>right thing (all mandatory by the gigE standard, so every NIC will
>support it). If you use a fast ethernet cross cable, the NICs will
>autonegotiate to 100 MB/s full-duplex.
>
>The Intel gigE NICs are very good: good hardware, good driver, good
>support. Gigabit ethernet switches are becoming rather cheap: 200 EUR
>buys you an 8 port switch.
>
>
>Erik
>
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance
2004-12-03 16:32 ` David Greaves
@ 2004-12-03 16:50 ` Guy
0 siblings, 0 replies; 36+ messages in thread
From: Guy @ 2004-12-03 16:50 UTC (permalink / raw)
To: 'David Greaves', 'Erik Mouw'; +Cc: 'TJ', linux-raid
Now I have network envy! I am feeling inadequate. :)
I guess I have some more XMAS ideas! :)
I have about 7+ computers in my house and 2 network printers.
I currently have a 24 port 10/100BaseT switch. A 5 port 100/1000BaseT
switch would be enough to make me happy. I would connect up to 4 computers
to Gigabit, and connect the Gigabit switch to the 100BaseT switch. Sweet!
I guess I better get a full time job!
Guy
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves
Sent: Friday, December 03, 2004 11:33 AM
To: Erik Mouw
Cc: TJ; linux-raid@vger.kernel.org
Subject: Re: Looking for the cause of poor I/O performance
I paid about £50 for a 5port gig switch
I have 3 e1000 cards (about £30 each) - they're relegated to doorstops
I'm afraid :(
Despite months of trying they just won't work with my consumer VIA/AMD
systems (and Ganesh and gang have tried)
I'm now using even cheaper Marvell based SMC EZ1000s (£20ish) - I doubt
I'll get close to the throughput the e1000s could achieve - but I get 3
times more then fast ethernet (and about 10 times more than e1000s)
which is worthwhile.
David
Erik Mouw wrote:
>On Fri, Dec 03, 2004 at 06:30:51AM -0500, TJ wrote:
>
>
>>I'm cheap. I use a crossover so I didn't have to spring for the switch.
The
>>NICs are Intel 82540EM's. I got them for around $55 per. I didn't think
that
>>was too bad for gigabit. Of course, these controllers may be complete
trash,
>>I dunno.
>>
>>
>
>You won't do any better than fast ethernet when you're using a
>crossover cable. Gigabit ethernet doesn't need crossover cables for
>direct connections, it uses all four wire pairs in cat5 cable and will
>automatically figure out if there's a direct connection and do the
>right thing (all mandatory by the gigE standard, so every NIC will
>support it). If you use a fast ethernet cross cable, the NICs will
>autonegotiate to 100 MB/s full-duplex.
>
>The Intel gigE NICs are very good: good hardware, good driver, good
>support. Gigabit ethernet switches are becoming rather cheap: 200 EUR
>buys you an 8 port switch.
>
>
>Erik
>
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-03 0:49 ` Mark Hahn
2004-12-03 3:54 ` Guy
2004-12-03 6:51 ` TJ
@ 2004-12-03 20:03 ` TJ
2004-12-04 22:59 ` Mark Hahn
2 siblings, 1 reply; 36+ messages in thread
From: TJ @ 2004-12-03 20:03 UTC (permalink / raw)
To: linux-raid
> > My server is a K6-500 with 43MB of RAM, standard x86 hardware. The
>
> such a machine was good in its day, but that day was what, 5-7 years ago?
> in practical terms, the machine probably has about 300 MB/s of memory
> bandwidth (vs 3000 for a low-end server today). further, it was not
> uncommon for chipsets to fail to cache then-large amounts of RAM (32M was a
> common limit for caches configured writeback, for instance, that would
> magically cache 64M if set to writethrough...)
Bah. As much as I had hoped to squeeze more out of this box, I think you have
a pretty solid point. I tried out doing a quick hdparm -tT on all of the
drives, using the same OS on the newer Athlon box I described. I got much
better numbers. I included the output below.
> > OS is Slackware 10.0 w/ 2.6.7 kernel I've had similar problems with the
>
> with a modern kernel, manual hdparm tuning is unnecessary and probably
> wrong.
>
> > To tune these drives, I use:
> > hdparm -c3 -d1 -m16 -X68 -k1 -A1 -a128 -M128 -u1 /dev/hd[kigca]
I also checked out what hdparm showed for default settings without modifying
them on boot, both on the Athlon, and on the K6. In the case of the Athlon,
you're right, performance suffered from my tuning. In the case of the K6, I
saw no noticeable difference.
It's interesting that on the Athlon, one controller was set to -c1 by default,
while on the K6, the same controller with the same drives is set to -c0 by
default by the exact same kernel. I'm at a loss for how this is determined
and why..
> if you don't mess with the config via hdparm, what mode do they come up in?
I included this for both machines.
I think I will concentrate on tweaking the PCI settings to see if I can't get
a bit more out of that bus and check for any noticable improvements in
throughput. I'm hoping that this still may be related to some sort of PCI
latency issue.
TJ Harrell
_______________________________________________________________________
Default settings on the Athlon:
75 GXP:
Timing buffer-cache reads: 1124 MB in 2.01 seconds = 560.12 MB/sec
Timing buffered disk reads: 108 MB in 3.02 seconds = 35.74 MB/sec
WD 1200:
Timing buffer-cache reads: 1116 MB in 2.00 seconds = 556.69 MB/sec
Timing buffered disk reads: 108 MB in 3.05 seconds = 35.43 MB/sec
WD 2000:
Timing buffer-cache reads: 1092 MB in 2.01 seconds = 544.45 MB/sec
Timing buffered disk reads: 106 MB in 3.00 seconds = 35.32 MB/sec
WD 400:
Timing buffer-cache reads: 1084 MB in 2.01 seconds = 540.46 MB/sec
Timing buffered disk reads: 122 MB in 3.02 seconds = 40.42 MB/sec
WD 2000:
Timing buffer-cache reads: 1140 MB in 2.00 seconds = 569.52 MB/sec
Timing buffered disk reads: 112 MB in 3.01 seconds = 37.19 MB/sec
Defaults on the K6:
/dev/hda:
multcount = 0 (off)
IO_support = 1 (32-bit)
unmaskirq = 1 (on)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 65535/16/63, sectors = 78165360, start = 0
/dev/hdc:
multcount = 0 (off)
IO_support = 1 (32-bit)
unmaskirq = 1 (on)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 24321/255/63, sectors = 390721968, start = 0
dev/hdg:
multcount = 0 (off)
IO_support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 24321/255/63, sectors = 390721968, start = 0
/dev/hdi:
multcount = 0 (off)
IO_support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 16383/255/63, sectors = 234441648, start = 0
/dev/hdk:
multcount = 0 (off)
IO_support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 65535/16/63, sectors = 90069840, start = 0
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-03 3:54 ` Guy
2004-12-03 6:33 ` TJ
@ 2004-12-04 15:23 ` TJ
2004-12-04 17:59 ` Guy
1 sibling, 1 reply; 36+ messages in thread
From: TJ @ 2004-12-04 15:23 UTC (permalink / raw)
To: linux-raid; +Cc: Guy
On Thursday 02 December 2004 10:54 pm, Guy wrote:
> My linux system is a P3-500 with 2 CPUs and 512 Meg RAM. My system is much
> faster than my network. I don't know how your K6-500 compares to my
> P3-500.
> My array gives about 60MB /second.
Now I'm extremely curious to know why your box does so much better than mine.
Does the bus run at 100? 133? I'm guessing it's SDRAM, not DDR. Also, does it
have a stock PCI bus, or something special?
TJ Harrell
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance
2004-12-04 15:23 ` TJ
@ 2004-12-04 17:59 ` Guy
2004-12-04 23:51 ` Mark Hahn
0 siblings, 1 reply; 36+ messages in thread
From: Guy @ 2004-12-04 17:59 UTC (permalink / raw)
To: 'TJ', linux-raid
My disks...
My system has 14 disk drives. At 65Meg per second they are only doing about
5 meg per second each. 6 of the disks are on a 40MB/second SCSI bus, this
limits my overall speed. During a re-sync I get about 6 Meg/second per
disk.
My system...
2 CPUs help. It's a Dell. :) It is what they call a workstation. The
chipset is Intel 440BX (going from memory, so not 100% sure). In its day it
was a high end system. It has SD ram. 100 Mhz system bus. All memory
slots are full with the same size DIMMs, so it can interleave if the chipset
supports that. The chipset has 3 PCI buses. Since my overall speed is not
exceeding the speed of 1 PCI bus, I don't think this helps me, but maybe it
does. Everything is SCSI, I don't know if that helps. My disks are on 3
different SCSI busses, 2 Adaptec cards and 1 built-in Adaptec chipset.
This may help, it is a Dell Precision Workstation 410.
http://support.dell.com/support/edocs/systems/deqkmt/specs.htm
If my system is so much faster because of the motherboard design, then cool!
I did not know motherboard design could make such a difference.
The test "hdparm -tT /dev/md2" used about 35% of both CPU's. The test is so
quick it is hard to be sure about the cpu load. I have 17 disks overall, so
I tried hdparm of all of my disks at the same time. This uses 100% of my
CPUs. I don't understand how this can report such high speeds on my 6 disks
on the slow SCSI bus.
Timing buffer-cache reads:
128 MB in 11.18 seconds = 11.45 MB/sec
128 MB in 11.03 seconds = 11.60 MB/sec
128 MB in 10.97 seconds = 11.67 MB/sec
128 MB in 10.91 seconds = 11.73 MB/sec
128 MB in 11.43 seconds = 11.20 MB/sec
128 MB in 11.37 seconds = 11.26 MB/sec
128 MB in 11.35 seconds = 11.28 MB/sec
128 MB in 11.37 seconds = 11.26 MB/sec
128 MB in 11.45 seconds = 11.18 MB/sec
128 MB in 11.97 seconds = 10.69 MB/sec
128 MB in 11.78 seconds = 10.87 MB/sec
128 MB in 11.99 seconds = 10.68 MB/sec
128 MB in 12.26 seconds = 10.44 MB/sec
128 MB in 12.18 seconds = 10.51 MB/sec
128 MB in 11.84 seconds = 10.81 MB/sec
128 MB in 11.84 seconds = 10.81 MB/sec
128 MB in 12.43 seconds = 10.30 MB/sec
Timing buffered disk reads:
64 MB in 9.42 seconds = 6.79 MB/sec
64 MB in 9.62 seconds = 6.65 MB/sec
64 MB in 9.95 seconds = 6.43 MB/sec
64 MB in 9.71 seconds = 6.59 MB/sec
64 MB in 10.17 seconds = 6.29 MB/sec
64 MB in 11.00 seconds = 5.82 MB/sec
64 MB in 11.45 seconds = 5.59 MB/sec
64 MB in 10.81 seconds = 5.92 MB/sec
64 MB in 11.20 seconds = 5.71 MB/sec
64 MB in 11.57 seconds = 5.53 MB/sec
64 MB in 10.89 seconds = 5.88 MB/sec
64 MB in 11.73 seconds = 5.46 MB/sec
64 MB in 11.27 seconds = 5.68 MB/sec
64 MB in 11.20 seconds = 5.71 MB/sec
64 MB in 12.18 seconds = 5.25 MB/sec
64 MB in 11.41 seconds = 5.61 MB/sec
64 MB in 11.91 seconds = 5.37 MB/sec
This is from a single disk:
Timing buffer-cache reads: 128 MB in 0.87 seconds =147.13 MB/sec
Timing buffered disk reads: 64 MB in 3.51 seconds = 18.23 MB/sec
When I test a single disk, they all perform about the same.
A single disk "buffer-cache" performs better than any of my SCSI buses. I
have 2 at 80 Meg/sec and 1 at 40 Meg/sec. The speed exceeds the speed of
the PCI bus. Ok, I understand. I was thinking buffer-cache was the disk
drive's on-board cache, but buffer-cache is the Linux disk cache. I think!
Now I wonder why it is so slow! :)
Anyway, I hope I gave you too much information! :)
Guy
-----Original Message-----
From: TJ [mailto:systemloc@earthlink.net]
Sent: Saturday, December 04, 2004 10:24 AM
To: linux-raid@vger.kernel.org
Cc: Guy
Subject: Re: Looking for the cause of poor I/O performance
On Thursday 02 December 2004 10:54 pm, Guy wrote:
> My linux system is a P3-500 with 2 CPUs and 512 Meg RAM. My system is
much
> faster than my network. I don't know how your K6-500 compares to my
> P3-500.
> My array gives about 60MB /second.
Now I'm extremely curious to know why your box does so much better than
mine.
Does the bus run at 100? 133? I'm guessing it's SDRAM, not DDR. Also, does
it
have a stock PCI bus, or something special?
TJ Harrell
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-03 20:03 ` TJ
@ 2004-12-04 22:59 ` Mark Hahn
0 siblings, 0 replies; 36+ messages in thread
From: Mark Hahn @ 2004-12-04 22:59 UTC (permalink / raw)
To: TJ; +Cc: linux-raid
> throughput. I'm hoping that this still may be related to some sort of PCI
> latency issue.
you can use setpci to do this sort of tweaking. naturally,
you probably want to mount your filesystems RO when you do this,
since accidents do happen...
but in abstract, it's quite odd to believe that you can persuade a
7-year old machine to perform as well as a current one. sure, it
sometimes happens, but so very much has changed. as a completely
random example, the topology of bus-like links in a modern box is
vastly less bottlenecked than in the bad old days. for instance,
it was completely normal back then for the chipset's builtin IDE
(and any add-ons) to sit on a single 32x33 PCI segment. that segment
was often unable to actually approach 133 MB/s (80 was pretty good
in those days). and often vendors would bend the rules a little
like adding a bit more delay in arbitration in order to permit
more PCI slots (since PCI, like any multidrop bus, always has a
speed-drops tradeoff).
nowadays, it's actually common to see unshared pcix slots direct
to a memory-controller-hub, along with unshared connections for
*ATA, sound, even ethernet. not to mention the fact that memory
itself is 10x faster.
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance
2004-12-04 17:59 ` Guy
@ 2004-12-04 23:51 ` Mark Hahn
2004-12-05 1:00 ` Steven Ihde
` (2 more replies)
0 siblings, 3 replies; 36+ messages in thread
From: Mark Hahn @ 2004-12-04 23:51 UTC (permalink / raw)
To: Guy; +Cc: linux-raid
> Timing buffer-cache reads:
> 128 MB in 11.18 seconds = 11.45 MB/sec
...
>
> Timing buffered disk reads:
> 64 MB in 9.42 seconds = 6.79 MB/sec
...
>
> This is from a single disk:
> Timing buffer-cache reads: 128 MB in 0.87 seconds =147.13 MB/sec
> Timing buffered disk reads: 64 MB in 3.51 seconds = 18.23 MB/sec
excellent! this is really a great example of how a machine's limited
internal bandwidth infringes on your raid performance.
running hdparm -T shows that your machine can manage about 150 MB/s
when simply doing a syscall, copying bytes to userspace, and returning.
no involvement of any IO device. this number is typically about half
the user-visible dram bandwidth as reported by the stream benchmark.
when you try to do parallel IO (either with a bunch of hdparm -t's
or with raid), each disk is desperately trying to write to dram
at about 18 MB/s, ignoring other bottlenecks. alas, we already know
that your available dram bandwidth is much lower than 14*18.
for comparison, a fairly crappy SiS 735-based k7 system with
64b-wide PC2100 can deliver maybe 1.2 GB/s dram bandwidth.
hdparm -T is about 500 MB/s, and would probably have trouble
breaking 200 MB/s with raid0 even if it had enough buses.
an older server of mine is e7500-based, dual xeon/2.4's, with
2xPC1600 ram. it sustains about 1.6 GB/s on Stream, and about
500 MB/s hdparm -T, and can sustain 250 MB/s through it's 6-disk
raid without any problem.
a newish server (dual-opteron, 2xPC2700) gives 1.4 GB/s under
hdparm -T, and I expect it could hit 600 MB/s without much trouble,
if given 10-12 disks and pcix (or better) controllers...
regards, mark hahn.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-04 23:51 ` Mark Hahn
@ 2004-12-05 1:00 ` Steven Ihde
2004-12-06 17:48 ` Steven Ihde
2004-12-05 2:16 ` Guy
2004-12-05 15:17 ` TJ
2 siblings, 1 reply; 36+ messages in thread
From: Steven Ihde @ 2004-12-05 1:00 UTC (permalink / raw)
To: Mark Hahn; +Cc: Guy, linux-raid
Well while we're on the subject ;-)
I have a three-disk raid5 array. In summary, the raid5 performs
slightly worse than any of the three disks alone. Memory bandwidth
tested by hdparm seems more than adequate (1.6GB/sec). Shouldn't
read-balancing give me some benefit here? Kernel is 2.6.8.
The system is an i865PE (I think) chipset with a 2.4GHz P4. I believe
the memory bandwidth is more than adequate and that the disks are
performing up to spec when tested alone (Seagate Barracudas, hda & hdc
are 80GB PATA, sda is 120GB SATA):
/dev/hda:
Timing cached reads: 3356 MB in 2.00 seconds = 1676.58 MB/sec
Timing buffered disk reads: 122 MB in 3.03 seconds = 40.24 MB/sec
/dev/hdc:
Timing cached reads: 3316 MB in 2.00 seconds = 1657.42 MB/sec
Timing buffered disk reads: 122 MB in 3.02 seconds = 40.34 MB/sec
/dev/sda:
Timing cached reads: 3344 MB in 2.00 seconds = 1673.09 MB/sec
Timing buffered disk reads: 122 MB in 3.04 seconds = 40.19 MB/sec
Now, the raid5 array:
/dev/md1:
Timing cached reads: 3408 MB in 2.00 seconds = 1704.26 MB/sec
Timing buffered disk reads: 114 MB in 3.01 seconds = 37.83 MB/sec
Slightly worse! Bonnie++ gives me an even lower number, about 30.9
MB/sec for sequential input from the raid5.
hda and hdc are attached to the on-board PATA interfaces (one per
channel, no slaves on either channel). sda is attached to the
on-board SATA interface (the other on-board SATA is empty).
A possible clue is that when tested individually but in parallel, hda
and hdc both halve their bandwidth:
/dev/hda:
Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec
Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec
/dev/hdc:
Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec
Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec
/dev/sda:
Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec
Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec
Could there be contention for some shared resource in the on-board
PATA chipset between hda and hdc? Would moving one of them to a
separate IDE controller on a PCI card help?
Am I unreasonable to think that I should be getting better than 37
MB/sec on raid5 read performance, given that each disk alone seems
capable of 40 MB/sec?
Thanks,
Steve
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance
2004-12-04 23:51 ` Mark Hahn
2004-12-05 1:00 ` Steven Ihde
@ 2004-12-05 2:16 ` Guy
2004-12-05 15:14 ` TJ
2004-12-05 15:17 ` TJ
2 siblings, 1 reply; 36+ messages in thread
From: Guy @ 2004-12-05 2:16 UTC (permalink / raw)
To: 'Mark Hahn'; +Cc: linux-raid
Ok, now I am confused.
I have a second Dell Precision Workstation 410:
System A:
CPUs 2 X 500 MHz
RAM 4 X 128 Meg SDRAM
Bus 100 MHz
Timing buffer-cache reads: 128 MB in 0.87 seconds =147.13 MB/sec
System B:
CPUs 2 X 1000 MHz
RAM 4 X 256 Meg Registered SDRAM
Bus 100 MHz
Timing buffer-cache reads: 524 MB in 2.00 seconds =262.00 MB/sec
Why is system B almost twice as fast?
Is registered RAM faster?
I know the CPU speed is twice as fast, but the system bus is still 100 MHz.
There are other differences I don't think would have an effect. Video
cards, modem, SCSI cards, HW RAID card, USB mouse.
Guy
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mark Hahn
Sent: Saturday, December 04, 2004 6:51 PM
To: Guy
Cc: linux-raid@vger.kernel.org
Subject: RE: Looking for the cause of poor I/O performance
> Timing buffer-cache reads:
> 128 MB in 11.18 seconds = 11.45 MB/sec
...
>
> Timing buffered disk reads:
> 64 MB in 9.42 seconds = 6.79 MB/sec
...
>
> This is from a single disk:
> Timing buffer-cache reads: 128 MB in 0.87 seconds =147.13 MB/sec
> Timing buffered disk reads: 64 MB in 3.51 seconds = 18.23 MB/sec
excellent! this is really a great example of how a machine's limited
internal bandwidth infringes on your raid performance.
running hdparm -T shows that your machine can manage about 150 MB/s
when simply doing a syscall, copying bytes to userspace, and returning.
no involvement of any IO device. this number is typically about half
the user-visible dram bandwidth as reported by the stream benchmark.
when you try to do parallel IO (either with a bunch of hdparm -t's
or with raid), each disk is desperately trying to write to dram
at about 18 MB/s, ignoring other bottlenecks. alas, we already know
that your available dram bandwidth is much lower than 14*18.
for comparison, a fairly crappy SiS 735-based k7 system with
64b-wide PC2100 can deliver maybe 1.2 GB/s dram bandwidth.
hdparm -T is about 500 MB/s, and would probably have trouble
breaking 200 MB/s with raid0 even if it had enough buses.
an older server of mine is e7500-based, dual xeon/2.4's, with
2xPC1600 ram. it sustains about 1.6 GB/s on Stream, and about
500 MB/s hdparm -T, and can sustain 250 MB/s through it's 6-disk
raid without any problem.
a newish server (dual-opteron, 2xPC2700) gives 1.4 GB/s under
hdparm -T, and I expect it could hit 600 MB/s without much trouble,
if given 10-12 disks and pcix (or better) controllers...
regards, mark hahn.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-05 2:16 ` Guy
@ 2004-12-05 15:14 ` TJ
2004-12-06 21:39 ` Mark Hahn
0 siblings, 1 reply; 36+ messages in thread
From: TJ @ 2004-12-05 15:14 UTC (permalink / raw)
To: linux-raid
On Saturday 04 December 2004 09:16 pm, Guy wrote:
> Ok, now I am confused.
> I have a second Dell Precision Workstation 410:
>
> System A:
> CPUs 2 X 500 MHz
> RAM 4 X 128 Meg SDRAM
> Bus 100 MHz
> Timing buffer-cache reads: 128 MB in 0.87 seconds =147.13 MB/sec
>
> System B:
> CPUs 2 X 1000 MHz
> RAM 4 X 256 Meg Registered SDRAM
> Bus 100 MHz
> Timing buffer-cache reads: 524 MB in 2.00 seconds =262.00 MB/sec
>
> Why is system B almost twice as fast?
> Is registered RAM faster?
> I know the CPU speed is twice as fast, but the system bus is still 100 MHz.
Memory interleaving, perhaps? Registered ram has higher latency. It's possible
that the machine is made to do memory interleaving with ECC ram to boost
performance..
TJ Harrell
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-04 23:51 ` Mark Hahn
2004-12-05 1:00 ` Steven Ihde
2004-12-05 2:16 ` Guy
@ 2004-12-05 15:17 ` TJ
2004-12-06 21:34 ` Mark Hahn
2 siblings, 1 reply; 36+ messages in thread
From: TJ @ 2004-12-05 15:17 UTC (permalink / raw)
To: linux-raid
> for comparison, a fairly crappy SiS 735-based k7 system with
> 64b-wide PC2100 can deliver maybe 1.2 GB/s dram bandwidth.
> hdparm -T is about 500 MB/s, and would probably have trouble
> breaking 200 MB/s with raid0 even if it had enough buses.
>
> an older server of mine is e7500-based, dual xeon/2.4's, with
> 2xPC1600 ram. it sustains about 1.6 GB/s on Stream, and about
> 500 MB/s hdparm -T, and can sustain 250 MB/s through it's 6-disk
> raid without any problem.
hmmm.. In these cases, how does the throughput exceed the PCI bandwidth? Do
these boards have multiple busses? PCI-X?
TJ Harrell
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-05 1:00 ` Steven Ihde
@ 2004-12-06 17:48 ` Steven Ihde
2004-12-06 19:29 ` Guy
0 siblings, 1 reply; 36+ messages in thread
From: Steven Ihde @ 2004-12-06 17:48 UTC (permalink / raw)
To: linux-raid
On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote:
[snip]
> A possible clue is that when tested individually but in parallel, hda
> and hdc both halve their bandwidth:
>
> /dev/hda:
> Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec
> Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec
> /dev/hdc:
> Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec
> Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec
> /dev/sda:
> Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec
> Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec
>
> Could there be contention for some shared resource in the on-board
> PATA chipset between hda and hdc? Would moving one of them to a
> separate IDE controller on a PCI card help?
>
> Am I unreasonable to think that I should be getting better than 37
> MB/sec on raid5 read performance, given that each disk alone seems
> capable of 40 MB/sec?
To answer my own question... I moved one of the PATA drives to a PCI
PATA controller. This did enable me to move 40MB/sec simultaneously
from all three drives. Guess there's some issue with the built-in
PATA on the ICH5R southbridge.
However, this didn't help raid5 performance -- it was still about
35-39MB/sec. I also have a raid1 array on the same physical disks,
and observed the same thing there (same read performance as a single
disk with hdparm -tT, about 40 MB/sec). So:
2.6.8 includes the raid1 read balancing fix which was mentioned
previously on this list -- should this show up as substantially better
hdparm -tT numbers for raid1 or is it more complicated than that?
Does raid5 do read-balancing at all or am I just fantasizing?
Thanks,
Steve
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance
2004-12-06 17:48 ` Steven Ihde
@ 2004-12-06 19:29 ` Guy
2004-12-06 21:10 ` David Greaves
2004-12-06 21:16 ` Steven Ihde
0 siblings, 2 replies; 36+ messages in thread
From: Guy @ 2004-12-06 19:29 UTC (permalink / raw)
To: 'Steven Ihde', linux-raid
RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive.
However, RAID5 does do read ahead, my speed is about 3.5 times as fast as a
single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec.
Guy
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde
Sent: Monday, December 06, 2004 12:49 PM
To: linux-raid@vger.kernel.org
Subject: Re: Looking for the cause of poor I/O performance
On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote:
[snip]
> A possible clue is that when tested individually but in parallel, hda
> and hdc both halve their bandwidth:
>
> /dev/hda:
> Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec
> Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec
> /dev/hdc:
> Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec
> Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec
> /dev/sda:
> Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec
> Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec
>
> Could there be contention for some shared resource in the on-board
> PATA chipset between hda and hdc? Would moving one of them to a
> separate IDE controller on a PCI card help?
>
> Am I unreasonable to think that I should be getting better than 37
> MB/sec on raid5 read performance, given that each disk alone seems
> capable of 40 MB/sec?
To answer my own question... I moved one of the PATA drives to a PCI
PATA controller. This did enable me to move 40MB/sec simultaneously
from all three drives. Guess there's some issue with the built-in
PATA on the ICH5R southbridge.
However, this didn't help raid5 performance -- it was still about
35-39MB/sec. I also have a raid1 array on the same physical disks,
and observed the same thing there (same read performance as a single
disk with hdparm -tT, about 40 MB/sec). So:
2.6.8 includes the raid1 read balancing fix which was mentioned
previously on this list -- should this show up as substantially better
hdparm -tT numbers for raid1 or is it more complicated than that?
Does raid5 do read-balancing at all or am I just fantasizing?
Thanks,
Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-06 19:29 ` Guy
@ 2004-12-06 21:10 ` David Greaves
2004-12-06 23:02 ` Guy
2004-12-06 21:16 ` Steven Ihde
1 sibling, 1 reply; 36+ messages in thread
From: David Greaves @ 2004-12-06 21:10 UTC (permalink / raw)
To: Guy; +Cc: 'Steven Ihde', linux-raid
but aren't the next 'n' blocks of data on (about) n drives that can be
read concurrently (if the read is big enough)
Guy wrote:
>RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive.
>However, RAID5 does do read ahead, my speed is about 3.5 times as fast as a
>single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec.
>
>Guy
>
>-----Original Message-----
>From: linux-raid-owner@vger.kernel.org
>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde
>Sent: Monday, December 06, 2004 12:49 PM
>To: linux-raid@vger.kernel.org
>Subject: Re: Looking for the cause of poor I/O performance
>
>On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote:
>[snip]
>
>
>>A possible clue is that when tested individually but in parallel, hda
>>and hdc both halve their bandwidth:
>>
>>/dev/hda:
>> Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec
>> Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec
>>/dev/hdc:
>> Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec
>> Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec
>>/dev/sda:
>> Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec
>> Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec
>>
>>Could there be contention for some shared resource in the on-board
>>PATA chipset between hda and hdc? Would moving one of them to a
>>separate IDE controller on a PCI card help?
>>
>>Am I unreasonable to think that I should be getting better than 37
>>MB/sec on raid5 read performance, given that each disk alone seems
>>capable of 40 MB/sec?
>>
>>
>
>To answer my own question... I moved one of the PATA drives to a PCI
>PATA controller. This did enable me to move 40MB/sec simultaneously
>from all three drives. Guess there's some issue with the built-in
>PATA on the ICH5R southbridge.
>
>However, this didn't help raid5 performance -- it was still about
>35-39MB/sec. I also have a raid1 array on the same physical disks,
>and observed the same thing there (same read performance as a single
>disk with hdparm -tT, about 40 MB/sec). So:
>
>2.6.8 includes the raid1 read balancing fix which was mentioned
>previously on this list -- should this show up as substantially better
>hdparm -tT numbers for raid1 or is it more complicated than that?
>
>Does raid5 do read-balancing at all or am I just fantasizing?
>
>Thanks,
>
>Steve
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-06 19:29 ` Guy
2004-12-06 21:10 ` David Greaves
@ 2004-12-06 21:16 ` Steven Ihde
1 sibling, 0 replies; 36+ messages in thread
From: Steven Ihde @ 2004-12-06 21:16 UTC (permalink / raw)
To: Guy; +Cc: linux-raid
Gotcha. Please excuse the loose use of terminology on my part.
But now I'm more convinced than ever that I should be getting better
performance than I am. I'm getting 40MB/sec from each disk
individually, I've shown with hdparm that I can pull 40MB/sec from all
three disks simultaneously, but still my raid5 read performance (in a
three-disk array) is slightly less than 40MB/sec.
Any guesses what the issue could be? Is there a switch for
read-ahead?
-Steve
On Mon, 06 Dec 2004 14:29:40 -0500, Guy wrote:
> RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive.
> However, RAID5 does do read ahead, my speed is about 3.5 times as fast as a
> single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec.
>
> Guy
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde
> Sent: Monday, December 06, 2004 12:49 PM
> To: linux-raid@vger.kernel.org
> Subject: Re: Looking for the cause of poor I/O performance
>
> On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote:
> [snip]
> > A possible clue is that when tested individually but in parallel, hda
> > and hdc both halve their bandwidth:
> >
> > /dev/hda:
> > Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec
> > Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec
> > /dev/hdc:
> > Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec
> > Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec
> > /dev/sda:
> > Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec
> > Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec
> >
> > Could there be contention for some shared resource in the on-board
> > PATA chipset between hda and hdc? Would moving one of them to a
> > separate IDE controller on a PCI card help?
> >
> > Am I unreasonable to think that I should be getting better than 37
> > MB/sec on raid5 read performance, given that each disk alone seems
> > capable of 40 MB/sec?
>
> To answer my own question... I moved one of the PATA drives to a PCI
> PATA controller. This did enable me to move 40MB/sec simultaneously
> from all three drives. Guess there's some issue with the built-in
> PATA on the ICH5R southbridge.
>
> However, this didn't help raid5 performance -- it was still about
> 35-39MB/sec. I also have a raid1 array on the same physical disks,
> and observed the same thing there (same read performance as a single
> disk with hdparm -tT, about 40 MB/sec). So:
>
> 2.6.8 includes the raid1 read balancing fix which was mentioned
> previously on this list -- should this show up as substantially better
> hdparm -tT numbers for raid1 or is it more complicated than that?
>
> Does raid5 do read-balancing at all or am I just fantasizing?
>
> Thanks,
>
> Steve
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-05 15:17 ` TJ
@ 2004-12-06 21:34 ` Mark Hahn
2004-12-06 23:06 ` Guy
0 siblings, 1 reply; 36+ messages in thread
From: Mark Hahn @ 2004-12-06 21:34 UTC (permalink / raw)
To: TJ; +Cc: linux-raid
> > for comparison, a fairly crappy SiS 735-based k7 system with
> > 64b-wide PC2100 can deliver maybe 1.2 GB/s dram bandwidth.
> > hdparm -T is about 500 MB/s, and would probably have trouble
> > breaking 200 MB/s with raid0 even if it had enough buses.
> >
> > an older server of mine is e7500-based, dual xeon/2.4's, with
> > 2xPC1600 ram. it sustains about 1.6 GB/s on Stream, and about
> > 500 MB/s hdparm -T, and can sustain 250 MB/s through it's 6-disk
> > raid without any problem.
>
> hmmm.. In these cases, how does the throughput exceed the PCI bandwidth? Do
> these boards have multiple busses? PCI-X?
I didn't say they exceeded bus bandwidth ("if it had enough").
the latter does actually have multiple buses, some of which are pcix;
that's pretty common among server boards.
interestingly, even non-server desktop parts can exceed PCI bandwidth -
I did a disk server a few years ago that used two chipset ATA ports
and an add-in 4-port card to hit about 150 MB/s total (sustained).
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-05 15:14 ` TJ
@ 2004-12-06 21:39 ` Mark Hahn
0 siblings, 0 replies; 36+ messages in thread
From: Mark Hahn @ 2004-12-06 21:39 UTC (permalink / raw)
To: TJ; +Cc: linux-raid
> > System A:
> > CPUs 2 X 500 MHz
> > RAM 4 X 128 Meg SDRAM
> > Bus 100 MHz
> > Timing buffer-cache reads: 128 MB in 0.87 seconds =147.13 MB/sec
> >
> > System B:
> > CPUs 2 X 1000 MHz
> > RAM 4 X 256 Meg Registered SDRAM
> > Bus 100 MHz
> > Timing buffer-cache reads: 524 MB in 2.00 seconds =262.00 MB/sec
> >
> > Why is system B almost twice as fast?
the buffer-cache is measuring systemcall overhead as well as speed of
pagecache-to-user-buffer copying. both those are certainly influenced
by the CPU speed - even by things like mmx.
> > Is registered RAM faster?
no, it's inherently slower (mostly latency, but since bursts are short,
also in bandwidth.)
> Memory interleaving, perhaps? Registered ram has higher latency. It's possible
> that the machine is made to do memory interleaving with ECC ram to boost
> performance..
sdram means that interleaving is basically irrelevant. interleaving was
important when a single bank of ram couldn't sustain one transaction per
cycle (EDO, FPM, etc).
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance
2004-12-06 21:10 ` David Greaves
@ 2004-12-06 23:02 ` Guy
2004-12-08 9:24 ` David Greaves
0 siblings, 1 reply; 36+ messages in thread
From: Guy @ 2004-12-06 23:02 UTC (permalink / raw)
To: 'David Greaves'; +Cc: 'Steven Ihde', linux-raid
Yes. I did say it reads ahead!
Guy
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves
Sent: Monday, December 06, 2004 4:10 PM
To: Guy
Cc: 'Steven Ihde'; linux-raid@vger.kernel.org
Subject: Re: Looking for the cause of poor I/O performance
but aren't the next 'n' blocks of data on (about) n drives that can be
read concurrently (if the read is big enough)
Guy wrote:
>RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive.
>However, RAID5 does do read ahead, my speed is about 3.5 times as fast as a
>single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec.
>
>Guy
>
>-----Original Message-----
>From: linux-raid-owner@vger.kernel.org
>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde
>Sent: Monday, December 06, 2004 12:49 PM
>To: linux-raid@vger.kernel.org
>Subject: Re: Looking for the cause of poor I/O performance
>
>On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote:
>[snip]
>
>
>>A possible clue is that when tested individually but in parallel, hda
>>and hdc both halve their bandwidth:
>>
>>/dev/hda:
>> Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec
>> Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec
>>/dev/hdc:
>> Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec
>> Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec
>>/dev/sda:
>> Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec
>> Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec
>>
>>Could there be contention for some shared resource in the on-board
>>PATA chipset between hda and hdc? Would moving one of them to a
>>separate IDE controller on a PCI card help?
>>
>>Am I unreasonable to think that I should be getting better than 37
>>MB/sec on raid5 read performance, given that each disk alone seems
>>capable of 40 MB/sec?
>>
>>
>
>To answer my own question... I moved one of the PATA drives to a PCI
>PATA controller. This did enable me to move 40MB/sec simultaneously
>from all three drives. Guess there's some issue with the built-in
>PATA on the ICH5R southbridge.
>
>However, this didn't help raid5 performance -- it was still about
>35-39MB/sec. I also have a raid1 array on the same physical disks,
>and observed the same thing there (same read performance as a single
>disk with hdparm -tT, about 40 MB/sec). So:
>
>2.6.8 includes the raid1 read balancing fix which was mentioned
>previously on this list -- should this show up as substantially better
>hdparm -tT numbers for raid1 or is it more complicated than that?
>
>Does raid5 do read-balancing at all or am I just fantasizing?
>
>Thanks,
>
>Steve
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance
2004-12-06 21:34 ` Mark Hahn
@ 2004-12-06 23:06 ` Guy
0 siblings, 0 replies; 36+ messages in thread
From: Guy @ 2004-12-06 23:06 UTC (permalink / raw)
To: 'Mark Hahn', 'TJ'; +Cc: linux-raid
Some systems may have 66 MHz PCI, or 64 bit.
Or, just more than 1 PCI bus. My desktop system has 3 PCI buses, I think.
Guy
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mark Hahn
Sent: Monday, December 06, 2004 4:35 PM
To: TJ
Cc: linux-raid@vger.kernel.org
Subject: Re: Looking for the cause of poor I/O performance
> > for comparison, a fairly crappy SiS 735-based k7 system with
> > 64b-wide PC2100 can deliver maybe 1.2 GB/s dram bandwidth.
> > hdparm -T is about 500 MB/s, and would probably have trouble
> > breaking 200 MB/s with raid0 even if it had enough buses.
> >
> > an older server of mine is e7500-based, dual xeon/2.4's, with
> > 2xPC1600 ram. it sustains about 1.6 GB/s on Stream, and about
> > 500 MB/s hdparm -T, and can sustain 250 MB/s through it's 6-disk
> > raid without any problem.
>
> hmmm.. In these cases, how does the throughput exceed the PCI bandwidth?
Do
> these boards have multiple busses? PCI-X?
I didn't say they exceeded bus bandwidth ("if it had enough").
the latter does actually have multiple buses, some of which are pcix;
that's pretty common among server boards.
interestingly, even non-server desktop parts can exceed PCI bandwidth -
I did a disk server a few years ago that used two chipset ATA ports
and an add-in 4-port card to hit about 150 MB/s total (sustained).
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-06 23:02 ` Guy
@ 2004-12-08 9:24 ` David Greaves
2004-12-08 18:31 ` Guy
0 siblings, 1 reply; 36+ messages in thread
From: David Greaves @ 2004-12-08 9:24 UTC (permalink / raw)
To: Guy; +Cc: 'Steven Ihde', linux-raid
My understanding of 'readahead' is that when an application asks for 312
bytes of data, the buffering code will anticipate more data is required
and will fill a buffer (4096 bytes). If we know that apps are really
greedy and read *loads* of data then we set a large readahead which will
cause the buffer code (?) to fill a further n buffers/kb according to
the readahead setting. This will all be read sequentially and the
performance boost is because the read heads on the drive get all the
data in one 'hit' - no unneeded seeks, no rotational latency.
That's not the same as raid5 where when asked for 312 bytes of data, the
buffering code wil fill the 4k buffer and then will issue a readahead on
the next n kb of data - which is spread over multiple disks, which read
in parallel, not sequentially.
Yes, the readahead triggers this behaviour - but you say "RAID5 can't do
read balancing." - which I thought it could through this mechanism.
It depends whether the original use of "read balancing" in this context
means "selecting a drive to obtain the data from according to the
drive's read queue" (no) or "distributing reads amongst the drives to
obtain a throughput greater than that of one individual drive" (yes)
(OK, the terminology is not quite exact but...)
do we agree? Or have I misunderstood something?
David
Guy wrote:
>Yes. I did say it reads ahead!
>
>Guy
>
>-----Original Message-----
>From: linux-raid-owner@vger.kernel.org
>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves
>Sent: Monday, December 06, 2004 4:10 PM
>To: Guy
>Cc: 'Steven Ihde'; linux-raid@vger.kernel.org
>Subject: Re: Looking for the cause of poor I/O performance
>
>but aren't the next 'n' blocks of data on (about) n drives that can be
>read concurrently (if the read is big enough)
>
>Guy wrote:
>
>
>
>>RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive.
>>However, RAID5 does do read ahead, my speed is about 3.5 times as fast as a
>>single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec.
>>
>>Guy
>>
>>-----Original Message-----
>>From: linux-raid-owner@vger.kernel.org
>>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde
>>Sent: Monday, December 06, 2004 12:49 PM
>>To: linux-raid@vger.kernel.org
>>Subject: Re: Looking for the cause of poor I/O performance
>>
>>On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote:
>>[snip]
>>
>>
>>
>>
>>>A possible clue is that when tested individually but in parallel, hda
>>>and hdc both halve their bandwidth:
>>>
>>>/dev/hda:
>>>Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec
>>>Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec
>>>/dev/hdc:
>>>Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec
>>>Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec
>>>/dev/sda:
>>>Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec
>>>Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec
>>>
>>>Could there be contention for some shared resource in the on-board
>>>PATA chipset between hda and hdc? Would moving one of them to a
>>>separate IDE controller on a PCI card help?
>>>
>>>Am I unreasonable to think that I should be getting better than 37
>>>MB/sec on raid5 read performance, given that each disk alone seems
>>>capable of 40 MB/sec?
>>>
>>>
>>>
>>>
>>To answer my own question... I moved one of the PATA drives to a PCI
>>PATA controller. This did enable me to move 40MB/sec simultaneously
>>
>>
>>from all three drives. Guess there's some issue with the built-in
>
>
>>PATA on the ICH5R southbridge.
>>
>>However, this didn't help raid5 performance -- it was still about
>>35-39MB/sec. I also have a raid1 array on the same physical disks,
>>and observed the same thing there (same read performance as a single
>>disk with hdparm -tT, about 40 MB/sec). So:
>>
>>2.6.8 includes the raid1 read balancing fix which was mentioned
>>previously on this list -- should this show up as substantially better
>>hdparm -tT numbers for raid1 or is it more complicated than that?
>>
>>Does raid5 do read-balancing at all or am I just fantasizing?
>>
>>Thanks,
>>
>>Steve
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>>
>>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance
2004-12-08 9:24 ` David Greaves
@ 2004-12-08 18:31 ` Guy
2004-12-08 22:00 ` Steven Ihde
0 siblings, 1 reply; 36+ messages in thread
From: Guy @ 2004-12-08 18:31 UTC (permalink / raw)
To: 'David Greaves'; +Cc: 'Steven Ihde', linux-raid
"read balancing" will help regardless of random or sequential disk access.
It can double your performance (assuming 2 disks).
"read ahead" only helps sequential access, it hurts random access.
Yes, I understand "read balancing" to be balancing the IO over 2 or more
disks, when only 1 disk is really needed. So, you need 2 or more copies of
the data, as in RAID1.
About read ahead...
The physical disks read ahead.
md does read ahead.
Since the disks and md are doing read ahead, you should have more than 1
disk reading at the same time. The physical disks are not very smart about
RAID5, when reading ahead, they will also read the parity data, which is
wasted effort.
With all of the above going on you should get more than 1 disk reading data
at the same time.
With RAID(0, 4, 5 and 6) no one can choose which disk(s) to read. You can't
balance anything. You can only predict what data will be needed before it
is requested. Read ahead does this for large files (sequential reads). I
would not consider this to be "read balancing", just read ahead.
Guy
-----Original Message-----
From: David Greaves [mailto:david@dgreaves.com]
Sent: Wednesday, December 08, 2004 4:24 AM
To: Guy
Cc: 'Steven Ihde'; linux-raid@vger.kernel.org
Subject: Re: Looking for the cause of poor I/O performance
My understanding of 'readahead' is that when an application asks for 312
bytes of data, the buffering code will anticipate more data is required
and will fill a buffer (4096 bytes). If we know that apps are really
greedy and read *loads* of data then we set a large readahead which will
cause the buffer code (?) to fill a further n buffers/kb according to
the readahead setting. This will all be read sequentially and the
performance boost is because the read heads on the drive get all the
data in one 'hit' - no unneeded seeks, no rotational latency.
That's not the same as raid5 where when asked for 312 bytes of data, the
buffering code wil fill the 4k buffer and then will issue a readahead on
the next n kb of data - which is spread over multiple disks, which read
in parallel, not sequentially.
Yes, the readahead triggers this behaviour - but you say "RAID5 can't do
read balancing." - which I thought it could through this mechanism.
It depends whether the original use of "read balancing" in this context
means "selecting a drive to obtain the data from according to the
drive's read queue" (no) or "distributing reads amongst the drives to
obtain a throughput greater than that of one individual drive" (yes)
(OK, the terminology is not quite exact but...)
do we agree? Or have I misunderstood something?
David
Guy wrote:
>Yes. I did say it reads ahead!
>
>Guy
>
>-----Original Message-----
>From: linux-raid-owner@vger.kernel.org
>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves
>Sent: Monday, December 06, 2004 4:10 PM
>To: Guy
>Cc: 'Steven Ihde'; linux-raid@vger.kernel.org
>Subject: Re: Looking for the cause of poor I/O performance
>
>but aren't the next 'n' blocks of data on (about) n drives that can be
>read concurrently (if the read is big enough)
>
>Guy wrote:
>
>
>
>>RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive.
>>However, RAID5 does do read ahead, my speed is about 3.5 times as fast as
a
>>single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec.
>>
>>Guy
>>
>>-----Original Message-----
>>From: linux-raid-owner@vger.kernel.org
>>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde
>>Sent: Monday, December 06, 2004 12:49 PM
>>To: linux-raid@vger.kernel.org
>>Subject: Re: Looking for the cause of poor I/O performance
>>
>>On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote:
>>[snip]
>>
>>
>>
>>
>>>A possible clue is that when tested individually but in parallel, hda
>>>and hdc both halve their bandwidth:
>>>
>>>/dev/hda:
>>>Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec
>>>Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec
>>>/dev/hdc:
>>>Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec
>>>Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec
>>>/dev/sda:
>>>Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec
>>>Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec
>>>
>>>Could there be contention for some shared resource in the on-board
>>>PATA chipset between hda and hdc? Would moving one of them to a
>>>separate IDE controller on a PCI card help?
>>>
>>>Am I unreasonable to think that I should be getting better than 37
>>>MB/sec on raid5 read performance, given that each disk alone seems
>>>capable of 40 MB/sec?
>>>
>>>
>>>
>>>
>>To answer my own question... I moved one of the PATA drives to a PCI
>>PATA controller. This did enable me to move 40MB/sec simultaneously
>>
>>
>>from all three drives. Guess there's some issue with the built-in
>
>
>>PATA on the ICH5R southbridge.
>>
>>However, this didn't help raid5 performance -- it was still about
>>35-39MB/sec. I also have a raid1 array on the same physical disks,
>>and observed the same thing there (same read performance as a single
>>disk with hdparm -tT, about 40 MB/sec). So:
>>
>>2.6.8 includes the raid1 read balancing fix which was mentioned
>>previously on this list -- should this show up as substantially better
>>hdparm -tT numbers for raid1 or is it more complicated than that?
>>
>>Does raid5 do read-balancing at all or am I just fantasizing?
>>
>>Thanks,
>>
>>Steve
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>>
>>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-08 18:31 ` Guy
@ 2004-12-08 22:00 ` Steven Ihde
2004-12-08 22:25 ` Guy
0 siblings, 1 reply; 36+ messages in thread
From: Steven Ihde @ 2004-12-08 22:00 UTC (permalink / raw)
To: Guy; +Cc: 'David Greaves', linux-raid
OK, between your discussion of read-ahead and Monday's post by Morten
Olsen about /proc/sys/vm/max-readahead, I think I get it now.
I'm using kernel 2.6 so /proc/sys/vm/max-readahead doesn't exist, but
"blockdev --getra/--setra" seems to do the trick. By increasing
readahead on my array device from 256 (the default) to 1024, I can
achieve 80MB/sec sequential read throughput (where before I could get
only 40MB/sec, same as a single disk).
As you point out while it helps sequential reads it may hurt random
reads, so I'll test a little more and see.
One other point -- apparently 2.6 allows one to set the read-ahead on
a per-device basis (maybe 2.4 does too, I don't know). So would it
make sense to set read-ahead on the disks low (or zero), and readahead
on the MD device high? Perhaps this could allow us to avoid the
overhead of reading unecessary parity chunks. As the number of disks
increases this would be less and less significant.
-Steve
On Wed, 08 Dec 2004 13:31:27 -0500, Guy wrote:
> "read balancing" will help regardless of random or sequential disk access.
> It can double your performance (assuming 2 disks).
>
> "read ahead" only helps sequential access, it hurts random access.
>
> Yes, I understand "read balancing" to be balancing the IO over 2 or more
> disks, when only 1 disk is really needed. So, you need 2 or more copies of
> the data, as in RAID1.
>
> About read ahead...
> The physical disks read ahead.
> md does read ahead.
> Since the disks and md are doing read ahead, you should have more than 1
> disk reading at the same time. The physical disks are not very smart about
> RAID5, when reading ahead, they will also read the parity data, which is
> wasted effort.
>
> With all of the above going on you should get more than 1 disk reading data
> at the same time.
>
> With RAID(0, 4, 5 and 6) no one can choose which disk(s) to read. You can't
> balance anything. You can only predict what data will be needed before it
> is requested. Read ahead does this for large files (sequential reads). I
> would not consider this to be "read balancing", just read ahead.
>
> Guy
>
> -----Original Message-----
> From: David Greaves [mailto:david@dgreaves.com]
> Sent: Wednesday, December 08, 2004 4:24 AM
> To: Guy
> Cc: 'Steven Ihde'; linux-raid@vger.kernel.org
> Subject: Re: Looking for the cause of poor I/O performance
>
> My understanding of 'readahead' is that when an application asks for 312
> bytes of data, the buffering code will anticipate more data is required
> and will fill a buffer (4096 bytes). If we know that apps are really
> greedy and read *loads* of data then we set a large readahead which will
> cause the buffer code (?) to fill a further n buffers/kb according to
> the readahead setting. This will all be read sequentially and the
> performance boost is because the read heads on the drive get all the
> data in one 'hit' - no unneeded seeks, no rotational latency.
>
> That's not the same as raid5 where when asked for 312 bytes of data, the
> buffering code wil fill the 4k buffer and then will issue a readahead on
> the next n kb of data - which is spread over multiple disks, which read
> in parallel, not sequentially.
>
> Yes, the readahead triggers this behaviour - but you say "RAID5 can't do
> read balancing." - which I thought it could through this mechanism.
>
> It depends whether the original use of "read balancing" in this context
> means "selecting a drive to obtain the data from according to the
> drive's read queue" (no) or "distributing reads amongst the drives to
> obtain a throughput greater than that of one individual drive" (yes)
> (OK, the terminology is not quite exact but...)
>
> do we agree? Or have I misunderstood something?
>
> David
>
> Guy wrote:
>
> >Yes. I did say it reads ahead!
> >
> >Guy
> >
> >-----Original Message-----
> >From: linux-raid-owner@vger.kernel.org
> >[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves
> >Sent: Monday, December 06, 2004 4:10 PM
> >To: Guy
> >Cc: 'Steven Ihde'; linux-raid@vger.kernel.org
> >Subject: Re: Looking for the cause of poor I/O performance
> >
> >but aren't the next 'n' blocks of data on (about) n drives that can be
> >read concurrently (if the read is big enough)
> >
> >Guy wrote:
> >
> >
> >
> >>RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive.
> >>However, RAID5 does do read ahead, my speed is about 3.5 times as fast as
> a
> >>single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec.
> >>
> >>Guy
> >>
> >>-----Original Message-----
> >>From: linux-raid-owner@vger.kernel.org
> >>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde
> >>Sent: Monday, December 06, 2004 12:49 PM
> >>To: linux-raid@vger.kernel.org
> >>Subject: Re: Looking for the cause of poor I/O performance
> >>
> >>On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote:
> >>[snip]
> >>
> >>
> >>
> >>
> >>>A possible clue is that when tested individually but in parallel, hda
> >>>and hdc both halve their bandwidth:
> >>>
> >>>/dev/hda:
> >>>Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec
> >>>Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec
> >>>/dev/hdc:
> >>>Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec
> >>>Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec
> >>>/dev/sda:
> >>>Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec
> >>>Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec
> >>>
> >>>Could there be contention for some shared resource in the on-board
> >>>PATA chipset between hda and hdc? Would moving one of them to a
> >>>separate IDE controller on a PCI card help?
> >>>
> >>>Am I unreasonable to think that I should be getting better than 37
> >>>MB/sec on raid5 read performance, given that each disk alone seems
> >>>capable of 40 MB/sec?
> >>>
> >>>
> >>>
> >>>
> >>To answer my own question... I moved one of the PATA drives to a PCI
> >>PATA controller. This did enable me to move 40MB/sec simultaneously
> >>
> >>
> >>from all three drives. Guess there's some issue with the built-in
> >
> >
> >>PATA on the ICH5R southbridge.
> >>
> >>However, this didn't help raid5 performance -- it was still about
> >>35-39MB/sec. I also have a raid1 array on the same physical disks,
> >>and observed the same thing there (same read performance as a single
> >>disk with hdparm -tT, about 40 MB/sec). So:
> >>
> >>2.6.8 includes the raid1 read balancing fix which was mentioned
> >>previously on this list -- should this show up as substantially better
> >>hdparm -tT numbers for raid1 or is it more complicated than that?
> >>
> >>Does raid5 do read-balancing at all or am I just fantasizing?
> >>
> >>Thanks,
> >>
> >>Steve
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance
2004-12-08 22:00 ` Steven Ihde
@ 2004-12-08 22:25 ` Guy
2004-12-08 22:41 ` Guy
0 siblings, 1 reply; 36+ messages in thread
From: Guy @ 2004-12-08 22:25 UTC (permalink / raw)
To: 'Steven Ihde'; +Cc: 'David Greaves', linux-raid
Good question!
"One other point -- apparently 2.6 allows one to set the read-ahead on a
per-device basis (maybe 2.4 does too, I don't know). So would it make sense
to set read-ahead on the disks low (or zero), and read ahead on the MD
device high? Perhaps this could allow us to avoid the overhead of reading
unnecessary parity chunks. As the number of disks increases this would be
less and less significant."
I was wondering about this myself.
I have read other people have played with the numbers, but I can't.
# blockdev --getra /dev/md2
1024
# blockdev --setra 2048 /dev/md2
BLKRASET: Invalid argument
# blockdev --setra 1024 /dev/md2
BLKRASET: Invalid argument
I can change read ahead on each drive. I can set read ahead from 0 to 255
on my disks, but this seems to have no effect. My performance using "hdparm
-t /dev/md2" stays about the same.
Odd, I just tried other sizes with md2. I can change read ahead from 0 to
255 also. But it was 1024. With read ahead set to 0 on all of my disks and
on md2, I still get the same performance. I guess on on-disk cache read
ahead does just fine.
My kernel is 2.4.28.
Guy
-----Original Message-----
From: Steven Ihde [mailto:x-linux-raid@hamachi.dyndns.org]
Sent: Wednesday, December 08, 2004 5:00 PM
To: Guy
Cc: 'David Greaves'; linux-raid@vger.kernel.org
Subject: Re: Looking for the cause of poor I/O performance
OK, between your discussion of read-ahead and Monday's post by Morten
Olsen about /proc/sys/vm/max-readahead, I think I get it now.
I'm using kernel 2.6 so /proc/sys/vm/max-readahead doesn't exist, but
"blockdev --getra/--setra" seems to do the trick. By increasing
readahead on my array device from 256 (the default) to 1024, I can
achieve 80MB/sec sequential read throughput (where before I could get
only 40MB/sec, same as a single disk).
As you point out while it helps sequential reads it may hurt random
reads, so I'll test a little more and see.
One other point -- apparently 2.6 allows one to set the read-ahead on
a per-device basis (maybe 2.4 does too, I don't know). So would it
make sense to set read-ahead on the disks low (or zero), and readahead
on the MD device high? Perhaps this could allow us to avoid the
overhead of reading unecessary parity chunks. As the number of disks
increases this would be less and less significant.
-Steve
On Wed, 08 Dec 2004 13:31:27 -0500, Guy wrote:
> "read balancing" will help regardless of random or sequential disk access.
> It can double your performance (assuming 2 disks).
>
> "read ahead" only helps sequential access, it hurts random access.
>
> Yes, I understand "read balancing" to be balancing the IO over 2 or more
> disks, when only 1 disk is really needed. So, you need 2 or more copies
of
> the data, as in RAID1.
>
> About read ahead...
> The physical disks read ahead.
> md does read ahead.
> Since the disks and md are doing read ahead, you should have more than 1
> disk reading at the same time. The physical disks are not very smart
about
> RAID5, when reading ahead, they will also read the parity data, which is
> wasted effort.
>
> With all of the above going on you should get more than 1 disk reading
data
> at the same time.
>
> With RAID(0, 4, 5 and 6) no one can choose which disk(s) to read. You
can't
> balance anything. You can only predict what data will be needed before it
> is requested. Read ahead does this for large files (sequential reads). I
> would not consider this to be "read balancing", just read ahead.
>
> Guy
>
> -----Original Message-----
> From: David Greaves [mailto:david@dgreaves.com]
> Sent: Wednesday, December 08, 2004 4:24 AM
> To: Guy
> Cc: 'Steven Ihde'; linux-raid@vger.kernel.org
> Subject: Re: Looking for the cause of poor I/O performance
>
> My understanding of 'readahead' is that when an application asks for 312
> bytes of data, the buffering code will anticipate more data is required
> and will fill a buffer (4096 bytes). If we know that apps are really
> greedy and read *loads* of data then we set a large readahead which will
> cause the buffer code (?) to fill a further n buffers/kb according to
> the readahead setting. This will all be read sequentially and the
> performance boost is because the read heads on the drive get all the
> data in one 'hit' - no unneeded seeks, no rotational latency.
>
> That's not the same as raid5 where when asked for 312 bytes of data, the
> buffering code wil fill the 4k buffer and then will issue a readahead on
> the next n kb of data - which is spread over multiple disks, which read
> in parallel, not sequentially.
>
> Yes, the readahead triggers this behaviour - but you say "RAID5 can't do
> read balancing." - which I thought it could through this mechanism.
>
> It depends whether the original use of "read balancing" in this context
> means "selecting a drive to obtain the data from according to the
> drive's read queue" (no) or "distributing reads amongst the drives to
> obtain a throughput greater than that of one individual drive" (yes)
> (OK, the terminology is not quite exact but...)
>
> do we agree? Or have I misunderstood something?
>
> David
>
> Guy wrote:
>
> >Yes. I did say it reads ahead!
> >
> >Guy
> >
> >-----Original Message-----
> >From: linux-raid-owner@vger.kernel.org
> >[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves
> >Sent: Monday, December 06, 2004 4:10 PM
> >To: Guy
> >Cc: 'Steven Ihde'; linux-raid@vger.kernel.org
> >Subject: Re: Looking for the cause of poor I/O performance
> >
> >but aren't the next 'n' blocks of data on (about) n drives that can be
> >read concurrently (if the read is big enough)
> >
> >Guy wrote:
> >
> >
> >
> >>RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive.
> >>However, RAID5 does do read ahead, my speed is about 3.5 times as fast
as
> a
> >>single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec.
> >>
> >>Guy
> >>
> >>-----Original Message-----
> >>From: linux-raid-owner@vger.kernel.org
> >>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde
> >>Sent: Monday, December 06, 2004 12:49 PM
> >>To: linux-raid@vger.kernel.org
> >>Subject: Re: Looking for the cause of poor I/O performance
> >>
> >>On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote:
> >>[snip]
> >>
> >>
> >>
> >>
> >>>A possible clue is that when tested individually but in parallel, hda
> >>>and hdc both halve their bandwidth:
> >>>
> >>>/dev/hda:
> >>>Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec
> >>>Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec
> >>>/dev/hdc:
> >>>Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec
> >>>Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec
> >>>/dev/sda:
> >>>Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec
> >>>Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec
> >>>
> >>>Could there be contention for some shared resource in the on-board
> >>>PATA chipset between hda and hdc? Would moving one of them to a
> >>>separate IDE controller on a PCI card help?
> >>>
> >>>Am I unreasonable to think that I should be getting better than 37
> >>>MB/sec on raid5 read performance, given that each disk alone seems
> >>>capable of 40 MB/sec?
> >>>
> >>>
> >>>
> >>>
> >>To answer my own question... I moved one of the PATA drives to a PCI
> >>PATA controller. This did enable me to move 40MB/sec simultaneously
> >>
> >>
> >>from all three drives. Guess there's some issue with the built-in
> >
> >
> >>PATA on the ICH5R southbridge.
> >>
> >>However, this didn't help raid5 performance -- it was still about
> >>35-39MB/sec. I also have a raid1 array on the same physical disks,
> >>and observed the same thing there (same read performance as a single
> >>disk with hdparm -tT, about 40 MB/sec). So:
> >>
> >>2.6.8 includes the raid1 read balancing fix which was mentioned
> >>previously on this list -- should this show up as substantially better
> >>hdparm -tT numbers for raid1 or is it more complicated than that?
> >>
> >>Does raid5 do read-balancing at all or am I just fantasizing?
> >>
> >>Thanks,
> >>
> >>Steve
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance
2004-12-08 22:25 ` Guy
@ 2004-12-08 22:41 ` Guy
2004-12-09 1:40 ` Steven Ihde
0 siblings, 1 reply; 36+ messages in thread
From: Guy @ 2004-12-08 22:41 UTC (permalink / raw)
To: 'Guy', 'Steven Ihde'; +Cc: 'David Greaves', linux-raid
I also tried changing /proc/sys/vm/max-readahead.
I tried the default of 31, 0 and 127. All gave me about the same
performance.
I started testing the speed with the dd command below. It complete in about
12.9 seconds. None of the read ahead changes seem to affect my speed.
Everything is now set to 0, still 12.9 seconds.
12.9 seconds = about 79.38 MB/sec.
time dd if=/dev/md2 of=/dev/null bs=1024k count=1024
Guy
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Guy
Sent: Wednesday, December 08, 2004 5:25 PM
To: 'Steven Ihde'
Cc: 'David Greaves'; linux-raid@vger.kernel.org
Subject: RE: Looking for the cause of poor I/O performance
Good question!
"One other point -- apparently 2.6 allows one to set the read-ahead on a
per-device basis (maybe 2.4 does too, I don't know). So would it make sense
to set read-ahead on the disks low (or zero), and read ahead on the MD
device high? Perhaps this could allow us to avoid the overhead of reading
unnecessary parity chunks. As the number of disks increases this would be
less and less significant."
I was wondering about this myself.
I have read other people have played with the numbers, but I can't.
# blockdev --getra /dev/md2
1024
# blockdev --setra 2048 /dev/md2
BLKRASET: Invalid argument
# blockdev --setra 1024 /dev/md2
BLKRASET: Invalid argument
I can change read ahead on each drive. I can set read ahead from 0 to 255
on my disks, but this seems to have no effect. My performance using "hdparm
-t /dev/md2" stays about the same.
Odd, I just tried other sizes with md2. I can change read ahead from 0 to
255 also. But it was 1024. With read ahead set to 0 on all of my disks and
on md2, I still get the same performance. I guess on on-disk cache read
ahead does just fine.
My kernel is 2.4.28.
Guy
-----Original Message-----
From: Steven Ihde [mailto:x-linux-raid@hamachi.dyndns.org]
Sent: Wednesday, December 08, 2004 5:00 PM
To: Guy
Cc: 'David Greaves'; linux-raid@vger.kernel.org
Subject: Re: Looking for the cause of poor I/O performance
OK, between your discussion of read-ahead and Monday's post by Morten
Olsen about /proc/sys/vm/max-readahead, I think I get it now.
I'm using kernel 2.6 so /proc/sys/vm/max-readahead doesn't exist, but
"blockdev --getra/--setra" seems to do the trick. By increasing
readahead on my array device from 256 (the default) to 1024, I can
achieve 80MB/sec sequential read throughput (where before I could get
only 40MB/sec, same as a single disk).
As you point out while it helps sequential reads it may hurt random
reads, so I'll test a little more and see.
One other point -- apparently 2.6 allows one to set the read-ahead on
a per-device basis (maybe 2.4 does too, I don't know). So would it
make sense to set read-ahead on the disks low (or zero), and readahead
on the MD device high? Perhaps this could allow us to avoid the
overhead of reading unecessary parity chunks. As the number of disks
increases this would be less and less significant.
-Steve
On Wed, 08 Dec 2004 13:31:27 -0500, Guy wrote:
> "read balancing" will help regardless of random or sequential disk access.
> It can double your performance (assuming 2 disks).
>
> "read ahead" only helps sequential access, it hurts random access.
>
> Yes, I understand "read balancing" to be balancing the IO over 2 or more
> disks, when only 1 disk is really needed. So, you need 2 or more copies
of
> the data, as in RAID1.
>
> About read ahead...
> The physical disks read ahead.
> md does read ahead.
> Since the disks and md are doing read ahead, you should have more than 1
> disk reading at the same time. The physical disks are not very smart
about
> RAID5, when reading ahead, they will also read the parity data, which is
> wasted effort.
>
> With all of the above going on you should get more than 1 disk reading
data
> at the same time.
>
> With RAID(0, 4, 5 and 6) no one can choose which disk(s) to read. You
can't
> balance anything. You can only predict what data will be needed before it
> is requested. Read ahead does this for large files (sequential reads). I
> would not consider this to be "read balancing", just read ahead.
>
> Guy
>
> -----Original Message-----
> From: David Greaves [mailto:david@dgreaves.com]
> Sent: Wednesday, December 08, 2004 4:24 AM
> To: Guy
> Cc: 'Steven Ihde'; linux-raid@vger.kernel.org
> Subject: Re: Looking for the cause of poor I/O performance
>
> My understanding of 'readahead' is that when an application asks for 312
> bytes of data, the buffering code will anticipate more data is required
> and will fill a buffer (4096 bytes). If we know that apps are really
> greedy and read *loads* of data then we set a large readahead which will
> cause the buffer code (?) to fill a further n buffers/kb according to
> the readahead setting. This will all be read sequentially and the
> performance boost is because the read heads on the drive get all the
> data in one 'hit' - no unneeded seeks, no rotational latency.
>
> That's not the same as raid5 where when asked for 312 bytes of data, the
> buffering code wil fill the 4k buffer and then will issue a readahead on
> the next n kb of data - which is spread over multiple disks, which read
> in parallel, not sequentially.
>
> Yes, the readahead triggers this behaviour - but you say "RAID5 can't do
> read balancing." - which I thought it could through this mechanism.
>
> It depends whether the original use of "read balancing" in this context
> means "selecting a drive to obtain the data from according to the
> drive's read queue" (no) or "distributing reads amongst the drives to
> obtain a throughput greater than that of one individual drive" (yes)
> (OK, the terminology is not quite exact but...)
>
> do we agree? Or have I misunderstood something?
>
> David
>
> Guy wrote:
>
> >Yes. I did say it reads ahead!
> >
> >Guy
> >
> >-----Original Message-----
> >From: linux-raid-owner@vger.kernel.org
> >[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves
> >Sent: Monday, December 06, 2004 4:10 PM
> >To: Guy
> >Cc: 'Steven Ihde'; linux-raid@vger.kernel.org
> >Subject: Re: Looking for the cause of poor I/O performance
> >
> >but aren't the next 'n' blocks of data on (about) n drives that can be
> >read concurrently (if the read is big enough)
> >
> >Guy wrote:
> >
> >
> >
> >>RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive.
> >>However, RAID5 does do read ahead, my speed is about 3.5 times as fast
as
> a
> >>single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec.
> >>
> >>Guy
> >>
> >>-----Original Message-----
> >>From: linux-raid-owner@vger.kernel.org
> >>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde
> >>Sent: Monday, December 06, 2004 12:49 PM
> >>To: linux-raid@vger.kernel.org
> >>Subject: Re: Looking for the cause of poor I/O performance
> >>
> >>On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote:
> >>[snip]
> >>
> >>
> >>
> >>
> >>>A possible clue is that when tested individually but in parallel, hda
> >>>and hdc both halve their bandwidth:
> >>>
> >>>/dev/hda:
> >>>Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec
> >>>Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec
> >>>/dev/hdc:
> >>>Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec
> >>>Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec
> >>>/dev/sda:
> >>>Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec
> >>>Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec
> >>>
> >>>Could there be contention for some shared resource in the on-board
> >>>PATA chipset between hda and hdc? Would moving one of them to a
> >>>separate IDE controller on a PCI card help?
> >>>
> >>>Am I unreasonable to think that I should be getting better than 37
> >>>MB/sec on raid5 read performance, given that each disk alone seems
> >>>capable of 40 MB/sec?
> >>>
> >>>
> >>>
> >>>
> >>To answer my own question... I moved one of the PATA drives to a PCI
> >>PATA controller. This did enable me to move 40MB/sec simultaneously
> >>
> >>
> >>from all three drives. Guess there's some issue with the built-in
> >
> >
> >>PATA on the ICH5R southbridge.
> >>
> >>However, this didn't help raid5 performance -- it was still about
> >>35-39MB/sec. I also have a raid1 array on the same physical disks,
> >>and observed the same thing there (same read performance as a single
> >>disk with hdparm -tT, about 40 MB/sec). So:
> >>
> >>2.6.8 includes the raid1 read balancing fix which was mentioned
> >>previously on this list -- should this show up as substantially better
> >>hdparm -tT numbers for raid1 or is it more complicated than that?
> >>
> >>Does raid5 do read-balancing at all or am I just fantasizing?
> >>
> >>Thanks,
> >>
> >>Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance
2004-12-08 22:41 ` Guy
@ 2004-12-09 1:40 ` Steven Ihde
0 siblings, 0 replies; 36+ messages in thread
From: Steven Ihde @ 2004-12-09 1:40 UTC (permalink / raw)
To: Guy; +Cc: 'David Greaves', linux-raid
On Wed, 08 Dec 2004 17:41:45 -0500, Guy wrote:
> I also tried changing /proc/sys/vm/max-readahead.
> I tried the default of 31, 0 and 127. All gave me about the same
> performance.
>
> I started testing the speed with the dd command below. It complete in about
> 12.9 seconds. None of the read ahead changes seem to affect my speed.
> Everything is now set to 0, still 12.9 seconds.
> 12.9 seconds = about 79.38 MB/sec.
>
> time dd if=/dev/md2 of=/dev/null bs=1024k count=1024
I'm running kernel 2.6.8; I found the readahead setting had a pretty
dramatic effect. I set readahead for all the drives and their
partitions to zero:
blockdev --setra 0 /dev/{hdc,hdg,sda,hdc5,hdg5,sda5}
I tested various readahead values for the array device by reading 1GB
of data from the device using this procedure:
blockdev --flushbufs /dev/md1
blockdev --setra $readahead /dev/md1
dd if=/dev/md1 of=/dev/null bs=1024k count=1024
These are the results:
RA transfer rate (B/s)
---------------
0: 15768513
128: 33680867
256: 42982770
512: 59223248
1024: 78590551
2048: 81918844
4096: 82386839
We seem to reach the point of diminishing returns at 1024 readahead,
~80MB/sec throughput. To recap, this is with three Seagate Barracuda
drives, two of which are 80GB PATA, the other a 120GB SATA, in a RAID5
configuration. 256 was the default readahead value. The chunk size
on my array is 32k. I don't know if that has an effect or not.
-Steve
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2004-12-09 1:40 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-03 11:30 Looking for the cause of poor I/O performance TJ
2004-12-03 11:46 ` Erik Mouw
2004-12-03 15:09 ` TJ
2004-12-03 16:25 ` Erik Mouw
2004-12-03 16:32 ` David Greaves
2004-12-03 16:50 ` Guy
-- strict thread matches above, loose matches on Subject: below --
2004-12-02 16:38 TJ
2004-12-03 0:49 ` Mark Hahn
2004-12-03 3:54 ` Guy
2004-12-03 6:33 ` TJ
2004-12-03 7:38 ` Guy
2004-12-04 15:23 ` TJ
2004-12-04 17:59 ` Guy
2004-12-04 23:51 ` Mark Hahn
2004-12-05 1:00 ` Steven Ihde
2004-12-06 17:48 ` Steven Ihde
2004-12-06 19:29 ` Guy
2004-12-06 21:10 ` David Greaves
2004-12-06 23:02 ` Guy
2004-12-08 9:24 ` David Greaves
2004-12-08 18:31 ` Guy
2004-12-08 22:00 ` Steven Ihde
2004-12-08 22:25 ` Guy
2004-12-08 22:41 ` Guy
2004-12-09 1:40 ` Steven Ihde
2004-12-06 21:16 ` Steven Ihde
2004-12-05 2:16 ` Guy
2004-12-05 15:14 ` TJ
2004-12-06 21:39 ` Mark Hahn
2004-12-05 15:17 ` TJ
2004-12-06 21:34 ` Mark Hahn
2004-12-06 23:06 ` Guy
2004-12-03 6:51 ` TJ
2004-12-03 20:03 ` TJ
2004-12-04 22:59 ` Mark Hahn
2004-12-03 7:12 ` TJ
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).