* to understand the logic of raid0_make_request
@ 2006-06-13 0:37 liu yang
2006-06-13 0:49 ` Neil Brown
0 siblings, 1 reply; 22+ messages in thread
From: liu yang @ 2006-06-13 0:37 UTC (permalink / raw)
To: linux-raid
hello,everyone.
I am studying the code of raid0.But I find that the logic of
raid0_make_request is a little difficult to understand.
Who can tell me what the function of raid0_make_request will do eventually?
Regards.
Thanks!
YangLiu
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: to understand the logic of raid0_make_request 2006-06-13 0:37 to understand the logic of raid0_make_request liu yang @ 2006-06-13 0:49 ` Neil Brown 2006-06-13 1:18 ` RAID tuning? Adam Talbot ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Neil Brown @ 2006-06-13 0:49 UTC (permalink / raw) To: liu yang; +Cc: linux-raid On Tuesday June 13, liudows2@gmail.com wrote: > hello,everyone. > I am studying the code of raid0.But I find that the logic of > raid0_make_request is a little difficult to understand. > Who can tell me what the function of raid0_make_request will do eventually? One of two possibilities. Most often it will update bio->bi_dev and bio->bi_sector to refer to the correct location on the correct underlying devices, and then will return '1'. The fact that it returns '1' is noticed by generic_make_request in block/ll_rw_block.c and generic_make_request will loop around and retry the request on the new device at the new offset. However in the unusual case that the request cross a chunk boundary and so needs to be sent to two different devices, raidi_make_request will split the bio into to (using bio_split) will submit each of the two bios directly down to the appropriate devices - and will then return '0', so that generic make request doesn't loop around. I hope that helps. NeilBrown ^ permalink raw reply [flat|nested] 22+ messages in thread
* RAID tuning? 2006-06-13 0:49 ` Neil Brown @ 2006-06-13 1:18 ` Adam Talbot 2006-06-13 10:19 ` Gordon Henderson 2006-06-13 18:45 ` to understand the logic of raid0_make_request Bill Davidsen 2006-06-16 2:53 ` liu yang 2 siblings, 1 reply; 22+ messages in thread From: Adam Talbot @ 2006-06-13 1:18 UTC (permalink / raw) To: linux-raid RAID tuning? Just got my new array setup running RAID 6 on 6 disks. Now I am looking to tune it. I am still testing and playing with it, so I dont mind rebuild the array a few times. Is chunk size per disk or is it total stripe? Will using mkfs.ext3 -b N make a performance difference? Are there any other things that I should keep in mind? -Adam ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID tuning? 2006-06-13 1:18 ` RAID tuning? Adam Talbot @ 2006-06-13 10:19 ` Gordon Henderson 2006-06-13 10:21 ` Justin Piszcz 0 siblings, 1 reply; 22+ messages in thread From: Gordon Henderson @ 2006-06-13 10:19 UTC (permalink / raw) To: Adam Talbot; +Cc: linux-raid On Mon, 12 Jun 2006, Adam Talbot wrote: > RAID tuning? > Just got my new array setup running RAID 6 on 6 disks. Now I am looking > to tune it. I am still testing and playing with it, so I dont mind > rebuild the array a few times. > > Is chunk size per disk or is it total stripe? As I understand it, (raid0,5&6,?) the chunk size is the amount of data the system will write to one disk before moving to the next disk. It probably affects all sorts of underlying "stuff" like buffers at the block level and so on. I guess the theory is that if you can fill up a chunk on N disks, and the kernel+hardware is capable of writing them in parallel (or as near as it can) then the overall write speed increases. I also think that the oder you give the drives to mdadm makes a difference - if they are on differnet controllers, and I asked this question recently, but didn't get any answers to it... An array I created this way recently, 2 SCSI chains, 7 drives on each, created using sda, sdh, sdb, sdi, etc. seems to come out as sda, sdb, sdc in /proc/mdstat, so who knows! > Will using mkfs.ext3 -b N make a performance difference? I've found that it does - if you get the right numbers, and not so much the -b option, (thats probabably optimal for most systems) but the -R stride=N option, as then ext3 can "know" about the chunk size and hopefully optimise its writes against that. A lot also depends on your usage paterns. If you're doing a lot of stuff with small files, or small writes, then a smaller (or the default) chunk size might be just fine - for streaming larger files, then a larger chunk size (& more memory ;-) might help - since you've got time to play here, I suggest you do :) One thing I did fine some time back though (and I never tracked it down, as I didn't have the time, alas) was some data corruption when using anything other than the default strip size (or more correctly, the default -R stride= option in mkfs.ext3 - I think it might have been ext3 rather than the md stuff - this was an older server with older ext2 tools and a 2.6 kernel) I put together a quick & dirty little script to test this though - see http://lion.drogon.net/diskCheck1 I usually run this a few times on a new array, and will leave it running for as long as possible on all partitions if I can. Under ext3, I'll fill the disk with ramdom files, unmount it, force an fsck on it, then mount/delete all files/umount it, then do the fsck again. That script It basically creates a random file, then copies this all over the disk, copying the copy each time. As for more tuning, & ext3, there is a section in the (rather old now!) HowTo about the numbers. http://www.tldp.org/HOWTO/Software-RAID-HOWTO-5.html#ss5.11 > Are there any other things that I should keep in mind? Test it, thrash it, power-kill it, remove a drive randomly, fsck it, rebuild it, etc. before it goes live. Be as intimately familiar with the "administration" side of things as you can, so thant if something does go wrong, you can calmly analyse whats wrong, remove a drive (if neccessary) and install & resync/build a new one if required. I've found it does help to do an fsck on it with it's reasonably full of files - if nothing else, it'll give you an indication of how long it will take should you ever have to do it for real. If your dealing with clients/managers/users, etc. it's always good to give them an idea, then you can go off and have a relaxing coffee or 2 ;-) And FWIW: I've been using RAID-6 on production servers for some time now and have been quite happy. It did save the day once when someone performed a memory upgrade on one server and somehow managed to boot the server with 2 drives missing )-: A reboot after a cable-check and a resync later and all was fine. (Is it just me, or are SATA cables and connectors rubbish?) Heres one I built earlier: (SCSI drives) md9 : active raid6 sdn1[13] sdm1[11] sdl1[9] sdk1[7] sdj1[5] sdi1[3] sdh1[1] sdg1[12] sdf1[10] sde1[8] sdd1[6] sdc1[4] sdb1[2] sda1[0] 3515533824 blocks level 6, 128k chunk, algorithm 2 [14/14] [UUUUUUUUUUUUUU] Filesystem Size Used Avail Use% Mounted on /dev/md9 3.3T 239G 3.1T 8% /mounts/pdrive It's actually running XFS rather than ext3 - I did manage to have some testing with this, and the only thing that swayed me towards XFS for this box, was it's ability to delete files much much quicker than ext3 (and this box might well require large quantities of files/directories) to be deleted on a semi-regular basis. I used: mkfs -t xfs -f -d su=128k,sw=14 /dev/md9 to create it - I *think* those parameters are OK for a 128K chunk size array - it was hard to pin this down, but performance seems adequate. (it's a write by one process, once or twice a day, read by many sort of data store) I use stock Debian 3.1 & the mdadm that comes with it, but a custom compiled 2.6.16.x kernel. Good luck! Gordon ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID tuning? 2006-06-13 10:19 ` Gordon Henderson @ 2006-06-13 10:21 ` Justin Piszcz 2006-06-13 10:23 ` Justin Piszcz 2006-06-13 10:32 ` Gordon Henderson 0 siblings, 2 replies; 22+ messages in thread From: Justin Piszcz @ 2006-06-13 10:21 UTC (permalink / raw) To: Gordon Henderson; +Cc: Adam Talbot, linux-raid mkfs -t xfs -f -d su=128k,sw=14 /dev/md9 Gordon, What speed do you get on your RAID, read and write? When I made my XFS/RAID-5, I accepted the defaults for the XFS filesystem but used a 512kb stripe. I get 80-90MB/s reads and ~39MB/s writes. On 5 x 400GB ATA/100 Seagates (on a regular PCI bus, max, 133mb/s) On Tue, 13 Jun 2006, Gordon Henderson wrote: > On Mon, 12 Jun 2006, Adam Talbot wrote: > >> RAID tuning? >> Just got my new array setup running RAID 6 on 6 disks. Now I am looking >> to tune it. I am still testing and playing with it, so I dont mind >> rebuild the array a few times. >> >> Is chunk size per disk or is it total stripe? > > As I understand it, (raid0,5&6,?) the chunk size is the amount of data the > system will write to one disk before moving to the next disk. It probably > affects all sorts of underlying "stuff" like buffers at the block level > and so on. I guess the theory is that if you can fill up a chunk on N > disks, and the kernel+hardware is capable of writing them in parallel (or > as near as it can) then the overall write speed increases. > > I also think that the oder you give the drives to mdadm makes a difference > - if they are on differnet controllers, and I asked this question > recently, but didn't get any answers to it... An array I created this way > recently, 2 SCSI chains, 7 drives on each, created using sda, sdh, sdb, > sdi, etc. seems to come out as sda, sdb, sdc in /proc/mdstat, so who > knows! > >> Will using mkfs.ext3 -b N make a performance difference? > > I've found that it does - if you get the right numbers, and not so much > the -b option, (thats probabably optimal for most systems) but the -R > stride=N option, as then ext3 can "know" about the chunk size and > hopefully optimise its writes against that. > > A lot also depends on your usage paterns. If you're doing a lot of stuff > with small files, or small writes, then a smaller (or the default) chunk > size might be just fine - for streaming larger files, then a larger chunk > size (& more memory ;-) might help - since you've got time to play here, I > suggest you do :) > > One thing I did fine some time back though (and I never tracked it down, > as I didn't have the time, alas) was some data corruption when using > anything other than the default strip size (or more correctly, the default > -R stride= option in mkfs.ext3 - I think it might have been ext3 rather > than the md stuff - this was an older server with older ext2 tools and a > 2.6 kernel) I put together a quick & dirty little script to test this > though - see > > http://lion.drogon.net/diskCheck1 > > I usually run this a few times on a new array, and will leave it running > for as long as possible on all partitions if I can. Under ext3, I'll fill > the disk with ramdom files, unmount it, force an fsck on it, then > mount/delete all files/umount it, then do the fsck again. That script It > basically creates a random file, then copies this all over the disk, > copying the copy each time. > > As for more tuning, & ext3, there is a section in the (rather old now!) > HowTo about the numbers. > > http://www.tldp.org/HOWTO/Software-RAID-HOWTO-5.html#ss5.11 > >> Are there any other things that I should keep in mind? > > Test it, thrash it, power-kill it, remove a drive randomly, fsck it, > rebuild it, etc. before it goes live. Be as intimately familiar with the > "administration" side of things as you can, so thant if something does go > wrong, you can calmly analyse whats wrong, remove a drive (if neccessary) > and install & resync/build a new one if required. > > I've found it does help to do an fsck on it with it's reasonably full of > files - if nothing else, it'll give you an indication of how long it will > take should you ever have to do it for real. If your dealing with > clients/managers/users, etc. it's always good to give them an idea, then > you can go off and have a relaxing coffee or 2 ;-) > > And FWIW: I've been using RAID-6 on production servers for some time now > and have been quite happy. It did save the day once when someone performed > a memory upgrade on one server and somehow managed to boot the server with > 2 drives missing )-: A reboot after a cable-check and a resync later and > all was fine. (Is it just me, or are SATA cables and connectors rubbish?) > > Heres one I built earlier: (SCSI drives) > > md9 : active raid6 sdn1[13] sdm1[11] sdl1[9] sdk1[7] sdj1[5] sdi1[3] > sdh1[1] sdg1[12] sdf1[10] sde1[8] sdd1[6] sdc1[4] sdb1[2] sda1[0] > 3515533824 blocks level 6, 128k chunk, algorithm 2 [14/14] > [UUUUUUUUUUUUUU] > > Filesystem Size Used Avail Use% Mounted on > /dev/md9 3.3T 239G 3.1T 8% /mounts/pdrive > > It's actually running XFS rather than ext3 - I did manage to have some > testing with this, and the only thing that swayed me towards XFS for this > box, was it's ability to delete files much much quicker than ext3 (and > this box might well require large quantities of files/directories) to be > deleted on a semi-regular basis. I used: > > mkfs -t xfs -f -d su=128k,sw=14 /dev/md9 > > to create it - I *think* those parameters are OK for a 128K chunk size > array - it was hard to pin this down, but performance seems adequate. > (it's a write by one process, once or twice a day, read by many sort of > data store) > > I use stock Debian 3.1 & the mdadm that comes with it, but a custom > compiled 2.6.16.x kernel. > > Good luck! > > Gordon > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID tuning? 2006-06-13 10:21 ` Justin Piszcz @ 2006-06-13 10:23 ` Justin Piszcz 2006-06-13 10:32 ` Gordon Henderson 1 sibling, 0 replies; 22+ messages in thread From: Justin Piszcz @ 2006-06-13 10:23 UTC (permalink / raw) To: Gordon Henderson; +Cc: Adam Talbot, linux-raid s/stripe/chunk size/g On Tue, 13 Jun 2006, Justin Piszcz wrote: > mkfs -t xfs -f -d su=128k,sw=14 /dev/md9 > > Gordon, What speed do you get on your RAID, read and write? > > When I made my XFS/RAID-5, I accepted the defaults for the XFS filesystem but > used a 512kb stripe. I get 80-90MB/s reads and ~39MB/s writes. > > On 5 x 400GB ATA/100 Seagates (on a regular PCI bus, max, 133mb/s) > > On Tue, 13 Jun 2006, Gordon Henderson wrote: > >> On Mon, 12 Jun 2006, Adam Talbot wrote: >> >>> RAID tuning? >>> Just got my new array setup running RAID 6 on 6 disks. Now I am looking >>> to tune it. I am still testing and playing with it, so I dont mind >>> rebuild the array a few times. >>> >>> Is chunk size per disk or is it total stripe? >> >> As I understand it, (raid0,5&6,?) the chunk size is the amount of data the >> system will write to one disk before moving to the next disk. It probably >> affects all sorts of underlying "stuff" like buffers at the block level >> and so on. I guess the theory is that if you can fill up a chunk on N >> disks, and the kernel+hardware is capable of writing them in parallel (or >> as near as it can) then the overall write speed increases. >> >> I also think that the oder you give the drives to mdadm makes a difference >> - if they are on differnet controllers, and I asked this question >> recently, but didn't get any answers to it... An array I created this way >> recently, 2 SCSI chains, 7 drives on each, created using sda, sdh, sdb, >> sdi, etc. seems to come out as sda, sdb, sdc in /proc/mdstat, so who >> knows! >> >>> Will using mkfs.ext3 -b N make a performance difference? >> >> I've found that it does - if you get the right numbers, and not so much >> the -b option, (thats probabably optimal for most systems) but the -R >> stride=N option, as then ext3 can "know" about the chunk size and >> hopefully optimise its writes against that. >> >> A lot also depends on your usage paterns. If you're doing a lot of stuff >> with small files, or small writes, then a smaller (or the default) chunk >> size might be just fine - for streaming larger files, then a larger chunk >> size (& more memory ;-) might help - since you've got time to play here, I >> suggest you do :) >> >> One thing I did fine some time back though (and I never tracked it down, >> as I didn't have the time, alas) was some data corruption when using >> anything other than the default strip size (or more correctly, the default >> -R stride= option in mkfs.ext3 - I think it might have been ext3 rather >> than the md stuff - this was an older server with older ext2 tools and a >> 2.6 kernel) I put together a quick & dirty little script to test this >> though - see >> >> http://lion.drogon.net/diskCheck1 >> >> I usually run this a few times on a new array, and will leave it running >> for as long as possible on all partitions if I can. Under ext3, I'll fill >> the disk with ramdom files, unmount it, force an fsck on it, then >> mount/delete all files/umount it, then do the fsck again. That script It >> basically creates a random file, then copies this all over the disk, >> copying the copy each time. >> >> As for more tuning, & ext3, there is a section in the (rather old now!) >> HowTo about the numbers. >> >> http://www.tldp.org/HOWTO/Software-RAID-HOWTO-5.html#ss5.11 >> >>> Are there any other things that I should keep in mind? >> >> Test it, thrash it, power-kill it, remove a drive randomly, fsck it, >> rebuild it, etc. before it goes live. Be as intimately familiar with the >> "administration" side of things as you can, so thant if something does go >> wrong, you can calmly analyse whats wrong, remove a drive (if neccessary) >> and install & resync/build a new one if required. >> >> I've found it does help to do an fsck on it with it's reasonably full of >> files - if nothing else, it'll give you an indication of how long it will >> take should you ever have to do it for real. If your dealing with >> clients/managers/users, etc. it's always good to give them an idea, then >> you can go off and have a relaxing coffee or 2 ;-) >> >> And FWIW: I've been using RAID-6 on production servers for some time now >> and have been quite happy. It did save the day once when someone performed >> a memory upgrade on one server and somehow managed to boot the server with >> 2 drives missing )-: A reboot after a cable-check and a resync later and >> all was fine. (Is it just me, or are SATA cables and connectors rubbish?) >> >> Heres one I built earlier: (SCSI drives) >> >> md9 : active raid6 sdn1[13] sdm1[11] sdl1[9] sdk1[7] sdj1[5] sdi1[3] >> sdh1[1] sdg1[12] sdf1[10] sde1[8] sdd1[6] sdc1[4] sdb1[2] sda1[0] >> 3515533824 blocks level 6, 128k chunk, algorithm 2 [14/14] >> [UUUUUUUUUUUUUU] >> >> Filesystem Size Used Avail Use% Mounted on >> /dev/md9 3.3T 239G 3.1T 8% /mounts/pdrive >> >> It's actually running XFS rather than ext3 - I did manage to have some >> testing with this, and the only thing that swayed me towards XFS for this >> box, was it's ability to delete files much much quicker than ext3 (and >> this box might well require large quantities of files/directories) to be >> deleted on a semi-regular basis. I used: >> >> mkfs -t xfs -f -d su=128k,sw=14 /dev/md9 >> >> to create it - I *think* those parameters are OK for a 128K chunk size >> array - it was hard to pin this down, but performance seems adequate. >> (it's a write by one process, once or twice a day, read by many sort of >> data store) >> >> I use stock Debian 3.1 & the mdadm that comes with it, but a custom >> compiled 2.6.16.x kernel. >> >> Good luck! >> >> Gordon >> - >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID tuning? 2006-06-13 10:21 ` Justin Piszcz 2006-06-13 10:23 ` Justin Piszcz @ 2006-06-13 10:32 ` Gordon Henderson 2006-06-13 17:57 ` Adam Talbot 1 sibling, 1 reply; 22+ messages in thread From: Gordon Henderson @ 2006-06-13 10:32 UTC (permalink / raw) To: Justin Piszcz; +Cc: Adam Talbot, linux-raid On Tue, 13 Jun 2006, Justin Piszcz wrote: > mkfs -t xfs -f -d su=128k,sw=14 /dev/md9 > > Gordon, What speed do you get on your RAID, read and write? > > When I made my XFS/RAID-5, I accepted the defaults for the XFS filesystem > but used a 512kb stripe. I get 80-90MB/s reads and ~39MB/s writes. > > On 5 x 400GB ATA/100 Seagates (on a regular PCI bus, max, 133mb/s) Standard Bonnie (which I find to be a crude, but reasonable test of overal throughput of a device - server has 1G of RAM) zem:/mounts/pdrive# bonnie -n0 -f -g0 -u0 Using uid:0, gid:0. Writing intelligently...done Rewriting...done Reading intelligently...done start 'em...done...done...done... Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zem 2G 195984 59 68699 31 217398 52 1028 3 zem,2G,,,195984,59,68699,31,,,217398,52,1028.3,3,,,,,,,,,,,,, Other tests: zem:/mounts/pdrive# hdparm -T /dev/md9 /dev/md9: Timing cached reads: 3592 MB in 2.00 seconds = 1797.08 MB/sec zem:/mounts/pdrive# hdparm -t /dev/sda /dev/md9 /dev/sda: Timing buffered disk reads: 228 MB in 3.01 seconds = 75.87 MB/sec /dev/md9: Timing buffered disk reads: 242 MB in 3.02 seconds = 80.14 MB/sec The server is some Dell 1U box with a single P4/HT processor and 1G of RAM. It has twin internal SCSI drives on a Fusion MPT driver, drives are using Linux s/w RAID-1, of course, and it has 2 external SCSI connectors (dual Adaptec controller) going to some big Dell 14-drive chasis with 7 drives on each chain. I've not actually seen this box - I did the entire build remote with the aid of someone in-front of the box who loaded Debian on it under my instructions, until I could SSH into it and complete the process, and did the neccessary BIOS fiddling to make sure it would boot off the internal drives - it's in Califronia, US I'm in Devon, UK... Gordon ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID tuning? 2006-06-13 10:32 ` Gordon Henderson @ 2006-06-13 17:57 ` Adam Talbot 2006-06-13 21:38 ` Gordon Henderson 0 siblings, 1 reply; 22+ messages in thread From: Adam Talbot @ 2006-06-13 17:57 UTC (permalink / raw) To: Gordon Henderson; +Cc: Justin Piszcz, linux-raid I still have not figured out if "block" is per disk or per stripe? My current array is rebuilding and states "64k chunk" is this a per disk number or is that a functional stripe? Nice scrip Gordon. It turns out that I have build a script that does the exact same thing. I call I thrash_test. Trying to take down the array before it goes in to production. It is a very good test! I ran hdpram -t last time the array was up. It states I am running at about 200 MB/sec. I will get you the real number when the array is done rebuilding. currently at 120min left... Can any one give me more info on this error? Pulled from /var/log/messages. "raid6: read error corrected!!" -Adam Gordon Henderson wrote: > On Tue, 13 Jun 2006, Justin Piszcz wrote: > > >> mkfs -t xfs -f -d su=128k,sw=14 /dev/md9 >> >> Gordon, What speed do you get on your RAID, read and write? >> >> When I made my XFS/RAID-5, I accepted the defaults for the XFS filesystem >> but used a 512kb stripe. I get 80-90MB/s reads and ~39MB/s writes. >> >> On 5 x 400GB ATA/100 Seagates (on a regular PCI bus, max, 133mb/s) >> > > Standard Bonnie (which I find to be a crude, but reasonable test of overal > throughput of a device - server has 1G of RAM) > > zem:/mounts/pdrive# bonnie -n0 -f -g0 -u0 > Using uid:0, gid:0. > Writing intelligently...done > Rewriting...done > Reading intelligently...done > start 'em...done...done...done... > Version 1.03 ------Sequential Output------ --Sequential Input- --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > zem 2G 195984 59 68699 31 217398 52 1028 3 > zem,2G,,,195984,59,68699,31,,,217398,52,1028.3,3,,,,,,,,,,,,, > > > Other tests: > > zem:/mounts/pdrive# hdparm -T /dev/md9 > /dev/md9: > Timing cached reads: 3592 MB in 2.00 seconds = 1797.08 MB/sec > > zem:/mounts/pdrive# hdparm -t /dev/sda /dev/md9 > /dev/sda: > Timing buffered disk reads: 228 MB in 3.01 seconds = 75.87 MB/sec > /dev/md9: > Timing buffered disk reads: 242 MB in 3.02 seconds = 80.14 MB/sec > > The server is some Dell 1U box with a single P4/HT processor and 1G of > RAM. It has twin internal SCSI drives on a Fusion MPT driver, drives are > using Linux s/w RAID-1, of course, and it has 2 external SCSI connectors > (dual Adaptec controller) going to some big Dell 14-drive chasis with 7 > drives on each chain. I've not actually seen this box - I did the entire > build remote with the aid of someone in-front of the box who loaded Debian > on it under my instructions, until I could SSH into it and complete the > process, and did the neccessary BIOS fiddling to make sure it would boot > off the internal drives - it's in Califronia, US I'm in Devon, UK... > > Gordon > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID tuning? 2006-06-13 17:57 ` Adam Talbot @ 2006-06-13 21:38 ` Gordon Henderson 2006-06-14 15:11 ` Nix 0 siblings, 1 reply; 22+ messages in thread From: Gordon Henderson @ 2006-06-13 21:38 UTC (permalink / raw) To: Adam Talbot; +Cc: Justin Piszcz, linux-raid On Tue, 13 Jun 2006, Adam Talbot wrote: > I still have not figured out if "block" is per disk or per stripe? > My current array is rebuilding and states "64k chunk" is this a per disk > number or is that a functional stripe? The block-size in the argument to mkfs is the size of the basic data block on disk. (if this is what you mean) In olden days you'd fiddle with this to do two things: By reducing it, you can reduce the amount of wasted space on the filesystem - eg. if you were to only write files of 1K long, then with a 4K blocks-size, you'll waste 3K of disk space for each file, secondly reducing it you can increase the number of inodes - which once upon a time was neccessary when using a disk for a usenet new spool disk (millions upon millions of small files where you'r typically run out of inodes) Of course you might be meaning something else ;-) > Nice scrip Gordon. It turns out that I have build a script that does > the exact same thing. I call I thrash_test. Trying to take down the > array before it goes in to production. It is a very good test! It's csh, so people can sue me if they like ;-) I do a number of things - trying to excercise as much of the hardware at any ont time as possible - so I'll typically run bonnie on each partition, and at the same time run a continuous wget to ftp a file into the box, and a wget from anothe box to ftp a file out (big files, 1GB or so), and run a loop of make clean ; make -j3 in the kernel build directory too. If I can do this for a week I'm usually more than happy!!! If anyone else has good "soaktest" type runes, I'd like to hear them to add to my collection. (I do run cpuburn too, but I don't think it's a good representation of "server" work on a modern machine). > I ran hdpram -t last time the array was up. It states I am running at > about 200 MB/sec. I will get you the real number when the array is done > rebuilding. currently at 120min left... hdparn really is quick & dirty. you might want to time a dd of /dev/md? to /dev/null with varying block sizes and read sizes to get 'raw' streaming performance. > Can any one give me more info on this error? Pulled from > /var/log/messages. > "raid6: read error corrected!!" Not seen that one!!! Gordon ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID tuning? 2006-06-13 21:38 ` Gordon Henderson @ 2006-06-14 15:11 ` Nix 2006-06-14 15:35 ` Molle Bestefich 2006-06-14 20:38 ` Disks keep failing durning testing Adam Talbot 0 siblings, 2 replies; 22+ messages in thread From: Nix @ 2006-06-14 15:11 UTC (permalink / raw) To: Gordon Henderson; +Cc: Adam Talbot, Justin Piszcz, linux-raid On 13 Jun 2006, Gordon Henderson said: > On Tue, 13 Jun 2006, Adam Talbot wrote: >> Can any one give me more info on this error? Pulled from >> /var/log/messages. >> "raid6: read error corrected!!" > > Not seen that one!!! The message is pretty easy to figure out and the code (in drivers/md/raid6main.c) is clear enough. The block device driver has reported a read error. In the old days (pre-2.6.15) the drive would have been kicked from the array for that, and the array would have dropped to degraded state; but nowadays the system tries to rewrite the stripe that should have been there (computed from the corresponding stripes on the other disks in the array), and only fails if that doesn't work. Generally hard disks activate sector sparing and stop reporting read errors for bad blocks only when the block is *written* to (it has to do that, annoying though the read errors are; since it can't read the data off the bad block, it can't tell what data should go onto the spare sector that replaces it until you write it). So it's disk damage, but unless it happens over and over again you probably don't need to be too conerned anymore. -- `Voting for any American political party is fundamentally incomprehensible.' --- Vadik ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID tuning? 2006-06-14 15:11 ` Nix @ 2006-06-14 15:35 ` Molle Bestefich 2006-06-14 20:38 ` Disks keep failing durning testing Adam Talbot 1 sibling, 0 replies; 22+ messages in thread From: Molle Bestefich @ 2006-06-14 15:35 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Nix wrote: > Adam Talbot wrote: > > Can any one give me more info on this error? > > Pulled from /var/log/messages. > > "raid6: read error corrected!!" > > The message is pretty easy to figure out and the code (in > drivers/md/raid6main.c) is clear enough. But the message could be clearer, for instance it would be a very good service to the user if the message showed which block had been corrected. Also using two !! in place of one ! makes the message seem a little dumb :-). ^ permalink raw reply [flat|nested] 22+ messages in thread
* Disks keep failing durning testing 2006-06-14 15:11 ` Nix 2006-06-14 15:35 ` Molle Bestefich @ 2006-06-14 20:38 ` Adam Talbot 2006-06-14 21:45 ` PFC 1 sibling, 1 reply; 22+ messages in thread From: Adam Talbot @ 2006-06-14 20:38 UTC (permalink / raw) To: Nix; +Cc: Gordon Henderson, Justin Piszcz, linux-raid You guys are going to love this one... I have ripped most of my hair out because of this bug and can not track down the problem! When ever I build my new array I looses a disk in slot N. So, a bad disk? Move that disk to a different slot in my drive cage, rebuilds just fine. New disk in slot N fails... Bad cable? Replace the cable to slot N... Drive in slot N still fails. Get mad at drive cage. Get new drive cage... Same slot "N" in new drive cage fails... Bad controller card? New controller card. So I have now replace the drive, drive cage, cable, controller card... I am going mad. My mother board has 4 onboard SATA ports, I have never had a drive fail on any of those 4 ports... Concept test. Took 4 "Failed" drives and installed them in the 4 slots the connected to the onboard SATA's... build array. The array built in record time, and built at 50MB/sec! 4 disk array works fine? Ok, try test 9 more times, just to be sure. Never once did the array fail during building or testing in the 4 drive/onboard state. What is wrong with my controller cards! I can connect them to any one of the drives in my array, to any slot in the cage and watch that drive fail... The drive(s) never fails at the same stop. All disks now have over 1000 error in the SMART Logs !@#$. All the exact same error: Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 35 00 f8 08 08 00 e0 00 00:06:26.949 WRITE DMA EXT 35 00 f8 08 08 00 e0 00 00:06:26.907 WRITE DMA EXT 35 00 f8 08 08 00 e0 00 00:06:26.866 WRITE DMA EXT 35 00 f8 08 08 00 e0 00 00:06:26.824 WRITE DMA EXT 35 00 f8 08 08 00 e0 00 00:06:26.782 WRITE DMA EXT My NAS's hardware. AMD Athlon 64 3000 1024MB DDR400 Foxconn 6150K8MA-8EKRS Motherboard Off brand 6X SATA drive cage 6X MaxLine 300GB SATAII 7200RPM Hard drive. Promise SATA300 TX2plus controller card Silicon Image, Inc. SiI 3112 controller card Any ideas? -Adam ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Disks keep failing durning testing 2006-06-14 20:38 ` Disks keep failing durning testing Adam Talbot @ 2006-06-14 21:45 ` PFC 2006-06-14 23:23 ` Adam Talbot 2006-06-15 12:12 ` Leo Kliger 0 siblings, 2 replies; 22+ messages in thread From: PFC @ 2006-06-14 21:45 UTC (permalink / raw) To: Adam Talbot, Nix; +Cc: Gordon Henderson, Justin Piszcz, linux-raid > 6X MaxLine 300GB SATAII 7200RPM Hard drive. Flip the jumper on the drive to set it to SATA1 (1.5 Gbps) instead of SATA2. On my PC Maxtor drives kept failing until I did this, not a problem since. Of course YMMV. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Disks keep failing durning testing 2006-06-14 21:45 ` PFC @ 2006-06-14 23:23 ` Adam Talbot 2006-06-15 12:12 ` Leo Kliger 1 sibling, 0 replies; 22+ messages in thread From: Adam Talbot @ 2006-06-14 23:23 UTC (permalink / raw) To: PFC; +Cc: Nix, Gordon Henderson, Justin Piszcz, linux-raid Wow... My greatest thanks... That is one heck of a bug. Any clue how to reset my SMART logs to 0 errors? PFC wrote: > >> 6X MaxLine 300GB SATAII 7200RPM Hard drive. > > Flip the jumper on the drive to set it to SATA1 (1.5 Gbps) instead > of SATA2. > On my PC Maxtor drives kept failing until I did this, not a > problem since. > > Of course YMMV. > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Disks keep failing durning testing 2006-06-14 21:45 ` PFC 2006-06-14 23:23 ` Adam Talbot @ 2006-06-15 12:12 ` Leo Kliger 1 sibling, 0 replies; 22+ messages in thread From: Leo Kliger @ 2006-06-15 12:12 UTC (permalink / raw) To: PFC; +Cc: linux-raid On Wed, 2006-06-14 at 23:45 +0200, PFC wrote: > > 6X MaxLine 300GB SATAII 7200RPM Hard drive. > > Flip the jumper on the drive to set it to SATA1 (1.5 Gbps) instead of > SATA2. > On my PC Maxtor drives kept failing until I did this, not a problem since. > > Of course YMMV. > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Is this possible on all SATAII drives? I have a similar issue with Western Digital drives and can't find this information on their website.... actually check out the following link and look at their graphic HOWTO:- http://www.westerndigital.com/en/products/products.asp?driveid=131&language=en It appears that the one jumper setting (for SSC - is this what I'm looking for?) they discuss is described as being enabled and disabled with or without the jumper. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: to understand the logic of raid0_make_request 2006-06-13 0:49 ` Neil Brown 2006-06-13 1:18 ` RAID tuning? Adam Talbot @ 2006-06-13 18:45 ` Bill Davidsen 2006-06-16 2:53 ` liu yang 2 siblings, 0 replies; 22+ messages in thread From: Bill Davidsen @ 2006-06-13 18:45 UTC (permalink / raw) To: Neil Brown; +Cc: liu yang, linux-raid Neil Brown wrote: >On Tuesday June 13, liudows2@gmail.com wrote: > > >>hello,everyone. >>I am studying the code of raid0.But I find that the logic of >>raid0_make_request is a little difficult to understand. >>Who can tell me what the function of raid0_make_request will do eventually? >> >> > >One of two possibilities. > >Most often it will update bio->bi_dev and bio->bi_sector to refer to >the correct location on the correct underlying devices, and then >will return '1'. >The fact that it returns '1' is noticed by generic_make_request in >block/ll_rw_block.c and generic_make_request will loop around and >retry the request on the new device at the new offset. > >However in the unusual case that the request cross a chunk boundary >and so needs to be sent to two different devices, raidi_make_request >will split the bio into to (using bio_split) will submit each of the >two bios directly down to the appropriate devices - and will then >return '0', so that generic make request doesn't loop around. > >I hope that helps. > Helps me, anyway, thanks! Wish the comments on stuff like that in general were clear, you can see what the code *does*, but you have to hope that it's what the coder *intended*. And if you're looking for a bug it may not be, so this is not an idle complaint. Some of the kernel coders think "if it was hard to write it should be hard to understand." -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: to understand the logic of raid0_make_request 2006-06-13 0:49 ` Neil Brown 2006-06-13 1:18 ` RAID tuning? Adam Talbot 2006-06-13 18:45 ` to understand the logic of raid0_make_request Bill Davidsen @ 2006-06-16 2:53 ` liu yang 2006-06-16 4:32 ` Neil Brown 2 siblings, 1 reply; 22+ messages in thread From: liu yang @ 2006-06-16 2:53 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid 2006/6/13, Neil Brown <neilb@suse.de>: > On Tuesday June 13, liudows2@gmail.com wrote: > > hello,everyone. > > I am studying the code of raid0.But I find that the logic of > > raid0_make_request is a little difficult to understand. > > Who can tell me what the function of raid0_make_request will do eventually? > > One of two possibilities. > > Most often it will update bio->bi_dev and bio->bi_sector to refer to > the correct location on the correct underlying devices, and then > will return '1'. > The fact that it returns '1' is noticed by generic_make_request in > block/ll_rw_block.c and generic_make_request will loop around and > retry the request on the new device at the new offset. > > However in the unusual case that the request cross a chunk boundary > and so needs to be sent to two different devices, raidi_make_request > will split the bio into to (using bio_split) will submit each of the > two bios directly down to the appropriate devices - and will then > return '0', so that generic make request doesn't loop around. > > I hope that helps. > > NeilBrown > Thanks a lot.I went through the code again following your guide.But I still can't understand how the bio->bi_sector and bio->bi_dev are computed.I don't know what the var 'block' stands for. Could you explain them to me ? Thanks! Regards. YangLiu ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: to understand the logic of raid0_make_request 2006-06-16 2:53 ` liu yang @ 2006-06-16 4:32 ` Neil Brown 2006-06-16 5:14 ` RAID on the root partition / Adam Talbot 2006-06-16 13:41 ` to understand the logic of raid0_make_request liu yang 0 siblings, 2 replies; 22+ messages in thread From: Neil Brown @ 2006-06-16 4:32 UTC (permalink / raw) To: liu yang; +Cc: linux-raid On Friday June 16, liudows2@gmail.com wrote: > > > Thanks a lot.I went through the code again following your guide.But I > still can't understand how the bio->bi_sector and bio->bi_dev are > computed.I don't know what the var 'block' stands for. > Could you explain them to me ? 'block' is simply "bi_sector/2" - the device offset in kilobytes rather than in sectors. raid0 supports having different devices of different sizes. The array is divided into 'zones'. The first zone has all devices, and extends as far as the smallest devices. The last zone extends to the end of the largest device, and may have only one, or several devices in it. The may be other zones depending on how many different sizes of device there are. The first thing that happens is the correct zone is found by lookng in the hash_table. Then we subtract the zone offset, divide by the chunk size, and then divide by the number of devices in that zone. The remainder of this last division tells us which device to use. Then we mutliply back out to find the offset in that device. I know that it rather brief, but I hope it helps. NeilBrown ^ permalink raw reply [flat|nested] 22+ messages in thread
* RAID on the root partition / 2006-06-16 4:32 ` Neil Brown @ 2006-06-16 5:14 ` Adam Talbot 2006-06-16 6:13 ` Neil Brown 2006-06-16 6:55 ` Gordon Henderson 2006-06-16 13:41 ` to understand the logic of raid0_make_request liu yang 1 sibling, 2 replies; 22+ messages in thread From: Adam Talbot @ 2006-06-16 5:14 UTC (permalink / raw) Cc: linux-raid What I hope to be an easy fix. Running Gentoo Linux and trying to setup RAID 1 across the root partition hda3 hdc3. Have the fstab set up to look for /dev/md3 and I have built the OS on /dev/md3. Works fine until I reboot. System loads and states it can not find /dev/md3 and when I look md3 not started. I have MD as part of the kernel, I have mdadm.conf setup like so: DEVICE /dev/hda3 /dev/hdc3 ARRAY /dev/md3 level=raid1 devices=/dev/hda3,/dev/hdc3 What am I missing? -Adam ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID on the root partition / 2006-06-16 5:14 ` RAID on the root partition / Adam Talbot @ 2006-06-16 6:13 ` Neil Brown 2006-06-16 6:55 ` Gordon Henderson 1 sibling, 0 replies; 22+ messages in thread From: Neil Brown @ 2006-06-16 6:13 UTC (permalink / raw) To: Adam Talbot; +Cc: linux-raid On Thursday June 15, talbotx@comcast.net wrote: > What I hope to be an easy fix. Running Gentoo Linux and trying to setup > RAID 1 across the root partition hda3 hdc3. Have the fstab set up to > look for /dev/md3 and I have built the OS on /dev/md3. Works fine until > I reboot. System loads and states it can not find /dev/md3 and when I > look md3 not started. I have MD as part of the kernel, I have > mdadm.conf setup like so: > > DEVICE /dev/hda3 /dev/hdc3 > ARRAY /dev/md3 level=raid1 devices=/dev/hda3,/dev/hdc3 > > What am I missing? If that mdadm.conf is on the root filesystem, then obviously it is of no value for setting up the root filesystem. You can: - use an initramfs to do the work. Many distros do this for you. The mdadm source contains some notes and scripts that I use. - use a boot parameter of md=3,/dev/hda3,/dev/hdc3 This is probably easiest, and as ide drives don't change their names this should be safe. initramfs is best though. NeilBrown ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID on the root partition / 2006-06-16 5:14 ` RAID on the root partition / Adam Talbot 2006-06-16 6:13 ` Neil Brown @ 2006-06-16 6:55 ` Gordon Henderson 1 sibling, 0 replies; 22+ messages in thread From: Gordon Henderson @ 2006-06-16 6:55 UTC (permalink / raw) To: Adam Talbot; +Cc: linux-raid On Thu, 15 Jun 2006, Adam Talbot wrote: > What I hope to be an easy fix. Running Gentoo Linux and trying to setup > RAID 1 across the root partition hda3 hdc3. Have the fstab set up to > look for /dev/md3 and I have built the OS on /dev/md3. Works fine until > I reboot. System loads and states it can not find /dev/md3 and when I > look md3 not started. I have MD as part of the kernel, I have > mdadm.conf setup like so: > > DEVICE /dev/hda3 /dev/hdc3 > ARRAY /dev/md3 level=raid1 devices=/dev/hda3,/dev/hdc3 > > What am I missing? Have you got the partition types set to 0xFD ? Gordon ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: to understand the logic of raid0_make_request 2006-06-16 4:32 ` Neil Brown 2006-06-16 5:14 ` RAID on the root partition / Adam Talbot @ 2006-06-16 13:41 ` liu yang 1 sibling, 0 replies; 22+ messages in thread From: liu yang @ 2006-06-16 13:41 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid Neil,thanks for you help.Your explanation is helpful really. I went through the code of raid0_make_request again this evening, and I still had some questions. 1\ block = bio->bi_sector >> 1,it's the device offset in kilotytes. so why do we use block substract zone->zone_offset? The zone->zone_offset is the zone offset relative the mddev in sectors. 2\ the codes below: x = block >> chunksize_bits; tmp_dev = zone->dev[sector_div(x, zone->nb_dev)]; actually, we get the underlying device by 'sector_div(x, zone->nb_dev)'.The var x is the chunk nr relative to the start of the mddev in my opinion.But not all of the zone->nb_dev is the same, so wo cann't get the right rdev by 'sector_div(x, zone->nb_dev)', I think. Why?Could you explain them to me? Thanks! Regards. YangLiu 2006/6/16, Neil Brown <neilb@suse.de>: > On Friday June 16, liudows2@gmail.com wrote: > > > > > > Thanks a lot.I went through the code again following your guide.But I > > still can't understand how the bio->bi_sector and bio->bi_dev are > > computed.I don't know what the var 'block' stands for. > > Could you explain them to me ? > > 'block' is simply "bi_sector/2" - the device offset in kilobytes > rather than in sectors. > > raid0 supports having different devices of different sizes. > The array is divided into 'zones'. > The first zone has all devices, and extends as far as the smallest > devices. > The last zone extends to the end of the largest device, and may have > only one, or several devices in it. > The may be other zones depending on how many different sizes of device > there are. > > The first thing that happens is the correct zone is found by lookng in > the hash_table. Then we subtract the zone offset, divide by the chunk > size, and then divide by the number of devices in that zone. The > remainder of this last division tells us which device to use. > Then we mutliply back out to find the offset in that device. > > I know that it rather brief, but I hope it helps. > > NeilBrown > ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2006-06-16 13:41 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-06-13 0:37 to understand the logic of raid0_make_request liu yang 2006-06-13 0:49 ` Neil Brown 2006-06-13 1:18 ` RAID tuning? Adam Talbot 2006-06-13 10:19 ` Gordon Henderson 2006-06-13 10:21 ` Justin Piszcz 2006-06-13 10:23 ` Justin Piszcz 2006-06-13 10:32 ` Gordon Henderson 2006-06-13 17:57 ` Adam Talbot 2006-06-13 21:38 ` Gordon Henderson 2006-06-14 15:11 ` Nix 2006-06-14 15:35 ` Molle Bestefich 2006-06-14 20:38 ` Disks keep failing durning testing Adam Talbot 2006-06-14 21:45 ` PFC 2006-06-14 23:23 ` Adam Talbot 2006-06-15 12:12 ` Leo Kliger 2006-06-13 18:45 ` to understand the logic of raid0_make_request Bill Davidsen 2006-06-16 2:53 ` liu yang 2006-06-16 4:32 ` Neil Brown 2006-06-16 5:14 ` RAID on the root partition / Adam Talbot 2006-06-16 6:13 ` Neil Brown 2006-06-16 6:55 ` Gordon Henderson 2006-06-16 13:41 ` to understand the logic of raid0_make_request liu yang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).