* paralellism of device use in md
@ 2006-01-17 12:09 Andy Smith
2006-01-17 23:04 ` Neil Brown
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Andy Smith @ 2006-01-17 12:09 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 634 bytes --]
I'm wondering: how well does md currently make use of the fact there
are multiple devices in the different (non-parity) RAID levels for
optimising reading and writing?
For example, are *writes* to a 2 device RAID-0 approaching twice as
fast as to a single device? If not, are they any faster at all?
Are reads from a 2 device RAID-1 twice as fast as from a single
device? If there are benefits, how quickly do they degrade to
nothing as disks are added?
What does the picture look like for reads and writes to a 4 device
RAID-10?
Sorry if my subject line isn't clear, but I coudn't think of a
better way to put it.
Thanks,
Andy
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: paralellism of device use in md 2006-01-17 12:09 paralellism of device use in md Andy Smith @ 2006-01-17 23:04 ` Neil Brown 2006-01-18 0:23 ` Tim Moore 2006-01-18 9:50 ` Andy Smith 2 siblings, 0 replies; 10+ messages in thread From: Neil Brown @ 2006-01-17 23:04 UTC (permalink / raw) To: Andy Smith; +Cc: linux-raid On Tuesday January 17, andy@lug.org.uk wrote: > I'm wondering: how well does md currently make use of the fact there > are multiple devices in the different (non-parity) RAID levels for > optimising reading and writing? It does the best it can. Every request from the filesystem goes directly to which device it should. Ofcourse if all the blocks the filesystem requests happen to be on the same drive, there isn't alot that md can do... > > For example, are *writes* to a 2 device RAID-0 approaching twice as > fast as to a single device? If not, are they any faster at all? > Are reads from a 2 device RAID-1 twice as fast as from a single > device? If there are benefits, how quickly do they degrade to > nothing as disks are added? Yes. For a reasonably heavy loads both read and write will be close to twice as fast on a 2 device RAID-0 compared to a single device (providing that the buss doesn't become a bottleneck). > > What does the picture look like for reads and writes to a 4 device > RAID-10? Much the same. > > Sorry if my subject line isn't clear, but I coudn't think of a > better way to put it. Clear enough. NeilBrown ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: paralellism of device use in md 2006-01-17 12:09 paralellism of device use in md Andy Smith 2006-01-17 23:04 ` Neil Brown @ 2006-01-18 0:23 ` Tim Moore 2006-01-18 7:41 ` Mario 'BitKoenig' Holbe 2006-01-18 9:50 ` Andy Smith 2 siblings, 1 reply; 10+ messages in thread From: Tim Moore @ 2006-01-18 0:23 UTC (permalink / raw) To: linux-raid Andy Smith wrote: > ... > For example, are *writes* to a 2 device RAID-0 approaching twice as > fast as to a single device? If not, are they any faster at all? > Are reads from a 2 device RAID-1 twice as fast as from a single > device? If there are benefits, how quickly do they degrade to > nothing as disks are added? > Server development where I work uses a 3-way mirror for system bits but this would be a costly solution for any sort of real storage and/or write performance. Here's the last chunk of a pair of 120GB WD SATA-I drives, SiI3112 chipset, sata_sil driver, 2.4.32 kernel. A three or four way stripe should get proportionally more provided separate controller channels and of course, risk scales with performance. [16:05] abit:~ > cat /proc/mdstat | head -7 Personalities : [raid0] [raid1] [raid5] read_ahead 1024 sectors md14 : active raid0 sdb13[1] sda13[0] 114575616 blocks 32k chunks md13 : active raid1 sdb12[1] sda12[0] 20113216 blocks [2/2] [UU] [16:06] abit:~ > hdparm -tT /dev/{md14,sd{a,b}13,md13,sd{a,b}12} /dev/md14: Timing buffer-cache reads: 1908 MB in 2.00 seconds = 954.00 MB/sec Timing buffered disk reads: 272 MB in 3.01 seconds = 90.37 MB/sec /dev/sda13: Timing buffer-cache reads: 1904 MB in 2.00 seconds = 952.00 MB/sec Timing buffered disk reads: 156 MB in 3.01 seconds = 51.83 MB/sec /dev/sdb13: Timing buffer-cache reads: 1912 MB in 2.00 seconds = 956.00 MB/sec Timing buffered disk reads: 136 MB in 3.00 seconds = 45.33 MB/sec /dev/md13: Timing buffer-cache reads: 1876 MB in 2.00 seconds = 938.00 MB/sec Timing buffered disk reads: 164 MB in 3.00 seconds = 54.67 MB/sec /dev/sda12: Timing buffer-cache reads: 1904 MB in 2.00 seconds = 952.00 MB/sec Timing buffered disk reads: 166 MB in 3.02 seconds = 54.97 MB/sec /dev/sdb12: Timing buffer-cache reads: 1892 MB in 2.00 seconds = 946.00 MB/sec Timing buffered disk reads: 146 MB in 3.00 seconds = 48.67 MB/sec [16:08] abit:~ > -- | for direct mail add "private_" in front of user name ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: paralellism of device use in md 2006-01-18 0:23 ` Tim Moore @ 2006-01-18 7:41 ` Mario 'BitKoenig' Holbe 2006-01-18 8:16 ` Mario 'BitKoenig' Holbe 0 siblings, 1 reply; 10+ messages in thread From: Mario 'BitKoenig' Holbe @ 2006-01-18 7:41 UTC (permalink / raw) To: linux-raid Tim Moore <linux-raid@nsr500.net> wrote: > Andy Smith wrote: >> Are reads from a 2 device RAID-1 twice as fast as from a single > md14 : active raid0 sdb13[1] sda13[0] > md13 : active raid1 sdb12[1] sda12[0] > > /dev/md14: > Timing buffered disk reads: 272 MB in 3.01 seconds = 90.37 MB/sec > /dev/md13: > Timing buffered disk reads: 164 MB in 3.00 seconds = 54.67 MB/sec And this is exactly the strange thing which I'm also experiencing and which was asked a lot of times on this list already, IIRC. Why is the single-stream read-performance of a RAID1 so much worse than the read-performance of a RAID0. A RAID1 should easily be able to gain (or perhaps even advance, since it's not bound to chunk borders) the read-performance of a RAID0. As far as I can see, RAID1 only does that in case of lots of parallel scheduled read-requests. Would it probably make sense to split one single read over all mirrors that are currently idle? regards Mario -- I've never been certain whether the moral of the Icarus story should only be, as is generally accepted, "Don't try to fly too high," or whether it might also be thought of as, "Forget the wax and feathers and do a better job on the wings." -- Stanley Kubrick ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: paralellism of device use in md 2006-01-18 7:41 ` Mario 'BitKoenig' Holbe @ 2006-01-18 8:16 ` Mario 'BitKoenig' Holbe 2006-01-18 17:55 ` Francois Barre 0 siblings, 1 reply; 10+ messages in thread From: Mario 'BitKoenig' Holbe @ 2006-01-18 8:16 UTC (permalink / raw) To: linux-raid Mario 'BitKoenig' Holbe <Mario.Holbe@TU-Ilmenau.DE> wrote: > scheduled read-requests. Would it probably make sense to split one > single read over all mirrors that are currently idle? A I got it from the other thread - seek times :) Perhaps using some big (virtual) chunk size could do the trick? What about using chunks that big that seeking is faster than data-transfer... assuming a data rate of 50MB/s and 9ms average seek time would result in at least 500kB chunks, 14ms average seek time would result in at least 750kB chunks. However, since the blocks being read are most likely somewhat close together, it's not a typical average seek, so probably smaller chunks would also be possible. regards Mario -- <Sique> Huch? 802.1q? Was sucht das denn hier? Wie kommt das ans TAGgeslicht? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: paralellism of device use in md 2006-01-18 8:16 ` Mario 'BitKoenig' Holbe @ 2006-01-18 17:55 ` Francois Barre 2006-01-18 23:34 ` Neil Brown 2006-01-19 11:30 ` Mario 'BitKoenig' Holbe 0 siblings, 2 replies; 10+ messages in thread From: Francois Barre @ 2006-01-18 17:55 UTC (permalink / raw) To: linux-raid 2006/1/18, Mario 'BitKoenig' Holbe <Mario.Holbe@tu-ilmenau.de>: > Mario 'BitKoenig' Holbe <Mario.Holbe@TU-Ilmenau.DE> wrote: > > scheduled read-requests. Would it probably make sense to split one > > single read over all mirrors that are currently idle? > > A I got it from the other thread - seek times :) > Perhaps using some big (virtual) chunk size could do the trick? What > about using chunks that big that seeking is faster than data-transfer... > assuming a data rate of 50MB/s and 9ms average seek time would result in > at least 500kB chunks, 14ms average seek time would result in at least > 750kB chunks. > However, since the blocks being read are most likely somewhat close > together, it's not a typical average seek, so probably smaller chunks > would also be possible. > > > regards > Mario Stop me if I'm wrong, but this is called... huge readahead. Instead of reading 32k on drive0 then 32k on drive1, you read continuous 512k from drive0 (16*32k) and 512k from drive1, resulting in a 1M read. Maybe for a single 4k page... So my additionnal question to this would be : how well does md fit with linux's/fs readahead policies ? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: paralellism of device use in md 2006-01-18 17:55 ` Francois Barre @ 2006-01-18 23:34 ` Neil Brown 2006-01-22 16:43 ` Tuomas Leikola 2006-01-19 11:30 ` Mario 'BitKoenig' Holbe 1 sibling, 1 reply; 10+ messages in thread From: Neil Brown @ 2006-01-18 23:34 UTC (permalink / raw) To: Francois Barre; +Cc: linux-raid On Wednesday January 18, francois.barre@gmail.com wrote: > 2006/1/18, Mario 'BitKoenig' Holbe <Mario.Holbe@tu-ilmenau.de>: > > Mario 'BitKoenig' Holbe <Mario.Holbe@TU-Ilmenau.DE> wrote: > > > scheduled read-requests. Would it probably make sense to split one > > > single read over all mirrors that are currently idle? > > > > A I got it from the other thread - seek times :) > > Perhaps using some big (virtual) chunk size could do the trick? What > > about using chunks that big that seeking is faster than data-transfer... > > assuming a data rate of 50MB/s and 9ms average seek time would result in > > at least 500kB chunks, 14ms average seek time would result in at least > > 750kB chunks. > > However, since the blocks being read are most likely somewhat close > > together, it's not a typical average seek, so probably smaller chunks > > would also be possible. > > > > > > regards > > Mario > > Stop me if I'm wrong, but this is called... huge readahead. Instead of > reading 32k on drive0 then 32k on drive1, you read continuous 512k > from drive0 (16*32k) and 512k from drive1, resulting in a 1M read. > Maybe for a single 4k page... > > So my additionnal question to this would be : how well does md fit > with linux's/fs readahead policies ? The read balancing in raid1 is clunky at best. I've often thought "there must be a better way". I've never thought what the better way might be (though I haven't tried very hard). If anyone would like to experiment with the read-balancing code, suggest and test changes, it would be most welcome. NeilBrown ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: paralellism of device use in md 2006-01-18 23:34 ` Neil Brown @ 2006-01-22 16:43 ` Tuomas Leikola 0 siblings, 0 replies; 10+ messages in thread From: Tuomas Leikola @ 2006-01-22 16:43 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid On 1/19/06, Neil Brown <neilb@suse.de> wrote: > > The read balancing in raid1 is clunky at best. I've often thought > "there must be a better way". I've never thought what the better way > might be (though I haven't tried very hard). > > If anyone would like to experiment with the read-balancing code, > suggest and test changes, it would be most welcome. > An interesting and desperately complex topic, intertwined with IO schedulers in general. I'll follow with my 2 cents. The way I see this, there are two fundamentally different approaches: optimize for throughput or latency. When optimizing for latency, the balancer would always choose a device that can serve a request in the shortest time. This is close to what the current code does, altough it doesn't seem to account for devices' pending request queue lengths. (I'd estimate for a traditional ATA disk, around 2-3 short seek requests is worth 1 long seek, because of spindle latency). I'd assume a "fair" in-order service for the latency mode. When optimizing for throughput, the balancer would choose a device that will have it's total queue completion time increased the least. This indicates reordering of requests etc. For queue depth of 1, the thoughput balancer would pick the "closest" available device as long as the devices are idle, and when they are all busy, leave the requests into array-wide queue until one of the devices becomes available, and then dequeue the request the device can serve fastest (or one that's had its deadline exceeded). Both approaches become difficult when taking into account device queues. The throughput balancer, as described, could just estimate how close the new request is to all others already in the device, and pick one that is nearby the other work. The latency scheduler is propably pretty much useless in this scenario, as its definition will change if requests can push each other around. I'd expect it to be useful in the common desktop configuration with no device queues though. One thing i'd like to see is more powerful estimates of request cost for a device. It's possible, if not practical, to profile devices for things like spindle latency and sector locations. If this cost estimation data is correct enough, per-device queues become less important as performance factors. As it is now, one can only hope that requests that are near LBA-wise are near timewise, which is not true for most devices. Yes, i know it's mostly wishful thinking. Measurements would be tricky and would provide complex maps for estimating costs, and (I think) would be virtually impossible to do correctly for anything with device queues. I'd expect that no drives in the market expose this kind of latency estimation data to the controller or OS. I'd also expect that high end storage system vendors use the very same information in their hardware raid implementations to provide better queuing and load balancing. Both the described balancer algorithms can be implemented somewhat easily, and (I'd expect) will work relatively well with common desktop drives. They could be optional (like the IO schedulers currently are), and different cost estimation algorithms could also be optional (and tunable if autotuning is out of question). Unfortunately my kernel hacking skills are too weak for most of this - there needs to be another who's interested enough. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: paralellism of device use in md 2006-01-18 17:55 ` Francois Barre 2006-01-18 23:34 ` Neil Brown @ 2006-01-19 11:30 ` Mario 'BitKoenig' Holbe 1 sibling, 0 replies; 10+ messages in thread From: Mario 'BitKoenig' Holbe @ 2006-01-19 11:30 UTC (permalink / raw) To: linux-raid Francois Barre <francois.barre@gmail.com> wrote: > 2006/1/18, Mario 'BitKoenig' Holbe <Mario.Holbe@tu-ilmenau.de>: >> Mario 'BitKoenig' Holbe <Mario.Holbe@TU-Ilmenau.DE> wrote: >> Perhaps using some big (virtual) chunk size could do the trick? What > Stop me if I'm wrong, but this is called... huge readahead. Instead of > reading 32k on drive0 then 32k on drive1, you read continuous 512k > from drive0 (16*32k) and 512k from drive1, resulting in a 1M read. > Maybe for a single 4k page... Yes, this would be the consequence. However, this would probably not be a big issue, since a) the current default read-ahead for RAID1 is 1024 (in 512-byte sectors) anyways. Furthermore, in the hardware-RAID sector b) at least the 3ware support recommends huge read-aheads for speeding up their RAID1s, too... afaik they recommend: vm.{min,max}-readahead=512 blockdev --setra 6144 /dev/... which is far more than 1M. I don't know why they do so but I could imagine they also use some strategy similar to the one I suggested. regards Mario -- Ho ho ho! I am Santa Claus of Borg. Nice assimilation all together! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: paralellism of device use in md 2006-01-17 12:09 paralellism of device use in md Andy Smith 2006-01-17 23:04 ` Neil Brown 2006-01-18 0:23 ` Tim Moore @ 2006-01-18 9:50 ` Andy Smith 2 siblings, 0 replies; 10+ messages in thread From: Andy Smith @ 2006-01-18 9:50 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 264 bytes --] On Tue, Jan 17, 2006 at 12:09:27PM +0000, Andy Smith wrote: > I'm wondering: how well does md currently make use of the fact there > are multiple devices in the different (non-parity) RAID levels for > optimising reading and writing? Thanks all for your answers. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-01-22 16:43 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-01-17 12:09 paralellism of device use in md Andy Smith 2006-01-17 23:04 ` Neil Brown 2006-01-18 0:23 ` Tim Moore 2006-01-18 7:41 ` Mario 'BitKoenig' Holbe 2006-01-18 8:16 ` Mario 'BitKoenig' Holbe 2006-01-18 17:55 ` Francois Barre 2006-01-18 23:34 ` Neil Brown 2006-01-22 16:43 ` Tuomas Leikola 2006-01-19 11:30 ` Mario 'BitKoenig' Holbe 2006-01-18 9:50 ` Andy Smith
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).