* iostat with raid device... @ 2011-04-08 19:55 Linux Raid Study 2011-04-08 22:05 ` Roberto Spadim 2011-04-08 23:46 ` NeilBrown 0 siblings, 2 replies; 16+ messages in thread From: Linux Raid Study @ 2011-04-08 19:55 UTC (permalink / raw) To: linux-raid; +Cc: linuxraid.study Hello, I have a raid device /dev/md0 based on 4 devices sd[abcd]. When I write 4GB to /dev/md0, I see following output from iostat... Ques: Shouldn't I see write/sec to be same for all four drives? Why does /dev/sdd always have higher value for BlksWrtn/sec? My strip size is 1MB. thanks for any pointers... avg-cpu: %user %nice %system %iowait %steal %idle 0.02 0.00 0.34 0.03 0.00 99.61 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1.08 247.77 338.73 37478883 51237136 sda1 1.08 247.77 338.73 37478195 51237136 sdb 1.08 247.73 338.78 37472990 51245712 sdb1 1.08 247.73 338.78 37472302 51245712 sdc 1.10 247.82 338.66 37486670 51226640 sdc1 1.10 247.82 338.66 37485982 51226640 sdd 1.09 118.46 467.97 17918510 70786576 sdd1 1.09 118.45 467.97 17917822 70786576 md0 65.60 443.79 1002.42 67129812 151629440 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-08 19:55 iostat with raid device Linux Raid Study @ 2011-04-08 22:05 ` Roberto Spadim 2011-04-08 22:10 ` Linux Raid Study 2011-04-08 23:46 ` NeilBrown 1 sibling, 1 reply; 16+ messages in thread From: Roberto Spadim @ 2011-04-08 22:05 UTC (permalink / raw) To: Linux Raid Study; +Cc: linux-raid another question... why md have more tps? disk elevators? sector size? 2011/4/8 Linux Raid Study <linuxraid.study@gmail.com>: > Hello, > > I have a raid device /dev/md0 based on 4 devices sd[abcd]. > > When I write 4GB to /dev/md0, I see following output from iostat... > > Ques: > Shouldn't I see write/sec to be same for all four drives? Why does > /dev/sdd always have higher value for BlksWrtn/sec? > My strip size is 1MB. > > thanks for any pointers... > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.02 0.00 0.34 0.03 0.00 99.61 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sda 1.08 247.77 338.73 37478883 51237136 > sda1 1.08 247.77 338.73 37478195 51237136 > sdb 1.08 247.73 338.78 37472990 51245712 > sdb1 1.08 247.73 338.78 37472302 51245712 > sdc 1.10 247.82 338.66 37486670 51226640 > sdc1 1.10 247.82 338.66 37485982 51226640 > sdd 1.09 118.46 467.97 17918510 70786576 > sdd1 1.09 118.45 467.97 17917822 70786576 > md0 65.60 443.79 1002.42 67129812 151629440 > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-08 22:05 ` Roberto Spadim @ 2011-04-08 22:10 ` Linux Raid Study 0 siblings, 0 replies; 16+ messages in thread From: Linux Raid Study @ 2011-04-08 22:10 UTC (permalink / raw) To: Roberto Spadim; +Cc: linux-raid Thanks for pointing this out...I did observe this but forgot to mention in the email... Can someone give some insight into this. Thanks. On Fri, Apr 8, 2011 at 3:05 PM, Roberto Spadim <roberto@spadim.com.br> wrote: > another question... why md have more tps? disk elevators? sector size? > > 2011/4/8 Linux Raid Study <linuxraid.study@gmail.com>: >> Hello, >> >> I have a raid device /dev/md0 based on 4 devices sd[abcd]. >> >> When I write 4GB to /dev/md0, I see following output from iostat... >> >> Ques: >> Shouldn't I see write/sec to be same for all four drives? Why does >> /dev/sdd always have higher value for BlksWrtn/sec? >> My strip size is 1MB. >> >> thanks for any pointers... >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.02 0.00 0.34 0.03 0.00 99.61 >> >> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >> sda 1.08 247.77 338.73 37478883 51237136 >> sda1 1.08 247.77 338.73 37478195 51237136 >> sdb 1.08 247.73 338.78 37472990 51245712 >> sdb1 1.08 247.73 338.78 37472302 51245712 >> sdc 1.10 247.82 338.66 37486670 51226640 >> sdc1 1.10 247.82 338.66 37485982 51226640 >> sdd 1.09 118.46 467.97 17918510 70786576 >> sdd1 1.09 118.45 467.97 17917822 70786576 >> md0 65.60 443.79 1002.42 67129812 151629440 >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-08 19:55 iostat with raid device Linux Raid Study 2011-04-08 22:05 ` Roberto Spadim @ 2011-04-08 23:46 ` NeilBrown 2011-04-09 0:40 ` Linux Raid Study 1 sibling, 1 reply; 16+ messages in thread From: NeilBrown @ 2011-04-08 23:46 UTC (permalink / raw) To: Linux Raid Study; +Cc: linux-raid On Fri, 8 Apr 2011 12:55:39 -0700 Linux Raid Study <linuxraid.study@gmail.com> wrote: > Hello, > > I have a raid device /dev/md0 based on 4 devices sd[abcd]. Would this be raid0? raid1? raid5? raid6? raid10? It could make a difference. > > When I write 4GB to /dev/md0, I see following output from iostat... Are you writing directly to the /dev/md0, or to a filesystem mounted from /dev/md0? It might be easier to explain in the second case, but you text suggests the first case. > > Ques: > Shouldn't I see write/sec to be same for all four drives? Why does > /dev/sdd always have higher value for BlksWrtn/sec? > My strip size is 1MB. > > thanks for any pointers... > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.02 0.00 0.34 0.03 0.00 99.61 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sda 1.08 247.77 338.73 37478883 51237136 > sda1 1.08 247.77 338.73 37478195 51237136 > sdb 1.08 247.73 338.78 37472990 51245712 > sdb1 1.08 247.73 338.78 37472302 51245712 > sdc 1.10 247.82 338.66 37486670 51226640 > sdc1 1.10 247.82 338.66 37485982 51226640 > sdd 1.09 118.46 467.97 17918510 70786576 > sdd1 1.09 118.45 467.97 17917822 70786576 > md0 65.60 443.79 1002.42 67129812 151629440 Doing the sums, for every 2 blocks written to md0 we see 3 blocks written to some underlying device. That doesn't make much sense for a 4 drive array. If we assume that the extra writes to sdd were from some other source, then It is closer to a 3:4 ratio which suggests raid5. So I'm guessing that the array is newly created and is recovering the data on sdd1 at the same time as you are doing the IO test. This would agree with the observation that sd[abc] see a lot more reads than sdd. I'll let you figure out the tps number.... do the math to find out the average blk/t number for each device. NeilBrown > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-08 23:46 ` NeilBrown @ 2011-04-09 0:40 ` Linux Raid Study 2011-04-09 8:50 ` Robin Hill 0 siblings, 1 reply; 16+ messages in thread From: Linux Raid Study @ 2011-04-09 0:40 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Hi Neil, This is raid5. I have mounted /dev/md0 to /mnt and file system is ext4. The system is newly created. Steps: mdadm for raid5 mkfs.ext4 /dev/md0 mount /dev/md0 /mnt/raid Export /mnt/raid to remote PC using CIFS Copy file from PC to the mounted drive An update.... I just ran the test again (without doing reformatting device) and noticed all 4 HDDs incremented the #ofWritesBlocks equally. This implies that when raid was configured first time, raid5 was trying to do its own stuff (recovery)... What I'm not sure of is if the device is newly formatted, would raid recovery happen? What else could explain difference in the first run of IO benchmark? Thanks. On Fri, Apr 8, 2011 at 4:46 PM, NeilBrown <neilb@suse.de> wrote: > On Fri, 8 Apr 2011 12:55:39 -0700 Linux Raid Study > <linuxraid.study@gmail.com> wrote: > >> Hello, >> >> I have a raid device /dev/md0 based on 4 devices sd[abcd]. > > Would this be raid0? raid1? raid5? raid6? raid10? > It could make a difference. > >> >> When I write 4GB to /dev/md0, I see following output from iostat... > > Are you writing directly to the /dev/md0, or to a filesystem mounted > from /dev/md0? It might be easier to explain in the second case, but you > text suggests the first case. > >> >> Ques: >> Shouldn't I see write/sec to be same for all four drives? Why does >> /dev/sdd always have higher value for BlksWrtn/sec? >> My strip size is 1MB. >> >> thanks for any pointers... >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.02 0.00 0.34 0.03 0.00 99.61 >> >> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >> sda 1.08 247.77 338.73 37478883 51237136 >> sda1 1.08 247.77 338.73 37478195 51237136 >> sdb 1.08 247.73 338.78 37472990 51245712 >> sdb1 1.08 247.73 338.78 37472302 51245712 >> sdc 1.10 247.82 338.66 37486670 51226640 >> sdc1 1.10 247.82 338.66 37485982 51226640 >> sdd 1.09 118.46 467.97 17918510 70786576 >> sdd1 1.09 118.45 467.97 17917822 70786576 >> md0 65.60 443.79 1002.42 67129812 151629440 > > Doing the sums, for every 2 blocks written to md0 we see 3 blocks written to > some underlying device. That doesn't make much sense for a 4 drive array. > If we assume that the extra writes to sdd were from some other source, then > It is closer to a 3:4 ratio which suggests raid5. > So I'm guessing that the array is newly created and is recovering the data on > sdd1 at the same time as you are doing the IO test. > This would agree with the observation that sd[abc] see a lot more reads than > sdd. > > I'll let you figure out the tps number.... do the math to find out the > average blk/t number for each device. > > NeilBrown > > >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-09 0:40 ` Linux Raid Study @ 2011-04-09 8:50 ` Robin Hill 2011-04-11 8:32 ` Linux Raid Study 0 siblings, 1 reply; 16+ messages in thread From: Robin Hill @ 2011-04-09 8:50 UTC (permalink / raw) To: Linux Raid Study; +Cc: NeilBrown, linux-raid [-- Attachment #1: Type: text/plain, Size: 747 bytes --] On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote: > What I'm not sure of is if the device is newly formatted, would raid > recovery happen? What else could explain difference in the first run > of IO benchmark? > When an array is first created, it's created in a degraded state - this is the simplest way to make it available to the user instantly. The final drive(s) are then automatically rebuilt, calculating the parity/data information as normal for recovering a drive. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-09 8:50 ` Robin Hill @ 2011-04-11 8:32 ` Linux Raid Study 2011-04-11 9:25 ` Robin Hill 0 siblings, 1 reply; 16+ messages in thread From: Linux Raid Study @ 2011-04-11 8:32 UTC (permalink / raw) To: Linux Raid Study, NeilBrown, linux-raid Hi Robin, Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in the iostat output are ok...is that correct? Thanks. On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill <robin@robinhill.me.uk> wrote: > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote: > >> What I'm not sure of is if the device is newly formatted, would raid >> recovery happen? What else could explain difference in the first run >> of IO benchmark? >> > When an array is first created, it's created in a degraded state - this > is the simplest way to make it available to the user instantly. The > final drive(s) are then automatically rebuilt, calculating the > parity/data information as normal for recovering a drive. > > Cheers, > Robin > -- > ___ > ( ' } | Robin Hill <robin@robinhill.me.uk> | > / / ) | Little Jim says .... | > // !! | "He fallen in de water !!" | > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-11 8:32 ` Linux Raid Study @ 2011-04-11 9:25 ` Robin Hill 2011-04-11 9:36 ` Linux Raid Study 0 siblings, 1 reply; 16+ messages in thread From: Robin Hill @ 2011-04-11 9:25 UTC (permalink / raw) To: Linux Raid Study; +Cc: NeilBrown, linux-raid [-- Attachment #1: Type: text/plain, Size: 1225 bytes --] On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote: > On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill <robin@robinhill.me.uk> wrote: > > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote: > > > >> What I'm not sure of is if the device is newly formatted, would raid > >> recovery happen? What else could explain difference in the first run > >> of IO benchmark? > >> > > When an array is first created, it's created in a degraded state - this > > is the simplest way to make it available to the user instantly. The > > final drive(s) are then automatically rebuilt, calculating the > > parity/data information as normal for recovering a drive. > > > Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in > the iostat output are ok...is that correct? > If it hadn't completed the initial recovery, yes. If it _had_ completed the initial recovery then I'd expect writes to be balanced (barring any differences in hardware). Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-11 9:25 ` Robin Hill @ 2011-04-11 9:36 ` Linux Raid Study 2011-04-11 9:53 ` Robin Hill 0 siblings, 1 reply; 16+ messages in thread From: Linux Raid Study @ 2011-04-11 9:36 UTC (permalink / raw) To: Linux Raid Study, NeilBrown, linux-raid; +Cc: Robin Hill The initial recovery should normally be done during first few minutes .... this is a newly formatted disk so there isn't any user data there. So, if I run the IO benchmark after say 3-4 min of doing, I should be ok? mdam --create /dev/md0 --raid5.... mount /dev/md0 /mnt/raid mkfs.ext4 /mnt/raid ...wait 3-4 min run IO benchmark... Am I correct? Thanks. On Mon, Apr 11, 2011 at 2:25 AM, Robin Hill <robin@robinhill.me.uk> wrote: > On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote: >> On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill <robin@robinhill.me.uk> wrote: >> > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote: >> > >> >> What I'm not sure of is if the device is newly formatted, would raid >> >> recovery happen? What else could explain difference in the first run >> >> of IO benchmark? >> >> >> > When an array is first created, it's created in a degraded state - this >> > is the simplest way to make it available to the user instantly. The >> > final drive(s) are then automatically rebuilt, calculating the >> > parity/data information as normal for recovering a drive. >> > >> Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in >> the iostat output are ok...is that correct? >> > If it hadn't completed the initial recovery, yes. If it _had_ completed > the initial recovery then I'd expect writes to be balanced (barring > any differences in hardware). > > Cheers, > Robin > -- > ___ > ( ' } | Robin Hill <robin@robinhill.me.uk> | > / / ) | Little Jim says .... | > // !! | "He fallen in de water !!" | > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-11 9:36 ` Linux Raid Study @ 2011-04-11 9:53 ` Robin Hill 2011-04-11 10:18 ` NeilBrown 0 siblings, 1 reply; 16+ messages in thread From: Robin Hill @ 2011-04-11 9:53 UTC (permalink / raw) To: Linux Raid Study; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2280 bytes --] On Mon Apr 11, 2011 at 02:36:50AM -0700, Linux Raid Study wrote: > On Mon, Apr 11, 2011 at 2:25 AM, Robin Hill <robin@robinhill.me.uk> wrote: > > On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote: > >> On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill <robin@robinhill.me.uk> wrote: > >> > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote: > >> > > >> >> What I'm not sure of is if the device is newly formatted, would raid > >> >> recovery happen? What else could explain difference in the first run > >> >> of IO benchmark? > >> >> > >> > When an array is first created, it's created in a degraded state - this > >> > is the simplest way to make it available to the user instantly. The > >> > final drive(s) are then automatically rebuilt, calculating the > >> > parity/data information as normal for recovering a drive. > >> > > >> Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in > >> the iostat output are ok...is that correct? > >> > > If it hadn't completed the initial recovery, yes. If it _had_ completed > > the initial recovery then I'd expect writes to be balanced (barring > > any differences in hardware). > > > The initial recovery should normally be done during first few minutes > .... this is a newly formatted disk so there isn't any user data > there. So, if I run the IO benchmark after say 3-4 min of doing, I > should be ok? > > mdam --create /dev/md0 --raid5.... > mount /dev/md0 /mnt/raid > mkfs.ext4 /mnt/raid > > ...wait 3-4 min > > run IO benchmark... > > Am I correct? > No, depending on the size of the drives, the initial recovery can take hours or even days. For RAID5 with N drives, it needs to read the entirity of (N-1) drives, and write the entirity of the remaining drive (whether there's any data or not, the initial state of the drives is unknown so parity data has to be calculated for the entire array). Check /proc/mdstat and wait until the array has completed resync before running any benchmarks. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-11 9:53 ` Robin Hill @ 2011-04-11 10:18 ` NeilBrown 2011-04-12 1:57 ` Linux Raid Study 0 siblings, 1 reply; 16+ messages in thread From: NeilBrown @ 2011-04-11 10:18 UTC (permalink / raw) To: Robin Hill; +Cc: Linux Raid Study, linux-raid [-- Attachment #1: Type: text/plain, Size: 2427 bytes --] On Mon, 11 Apr 2011 10:53:55 +0100 Robin Hill <robin@robinhill.me.uk> wrote: > On Mon Apr 11, 2011 at 02:36:50AM -0700, Linux Raid Study wrote: > > On Mon, Apr 11, 2011 at 2:25 AM, Robin Hill <robin@robinhill.me.uk> wrote: > > > On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote: > > >> On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill <robin@robinhill.me.uk> wrote: > > >> > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote: > > >> > > > >> >> What I'm not sure of is if the device is newly formatted, would raid > > >> >> recovery happen? What else could explain difference in the first run > > >> >> of IO benchmark? > > >> >> > > >> > When an array is first created, it's created in a degraded state - this > > >> > is the simplest way to make it available to the user instantly. The > > >> > final drive(s) are then automatically rebuilt, calculating the > > >> > parity/data information as normal for recovering a drive. > > >> > > > >> Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in > > >> the iostat output are ok...is that correct? > > >> > > > If it hadn't completed the initial recovery, yes. If it _had_ completed > > > the initial recovery then I'd expect writes to be balanced (barring > > > any differences in hardware). > > > > > The initial recovery should normally be done during first few minutes > > .... this is a newly formatted disk so there isn't any user data > > there. So, if I run the IO benchmark after say 3-4 min of doing, I > > should be ok? > > > > mdam --create /dev/md0 --raid5.... > > mount /dev/md0 /mnt/raid > > mkfs.ext4 /mnt/raid > > > > ...wait 3-4 min > > > > run IO benchmark... > > > > Am I correct? > > > No, depending on the size of the drives, the initial recovery can take > hours or even days. For RAID5 with N drives, it needs to read the > entirity of (N-1) drives, and write the entirity of the remaining drive > (whether there's any data or not, the initial state of the drives is > unknown so parity data has to be calculated for the entire array). > > Check /proc/mdstat and wait until the array has completed resync before > running any benchmarks. or run mdadm --wait /dev/md0 or create the array with --assume-clean. But if the array is raid5, don't trust the data if a device fails: use this only for testing. NeilBrown > > Cheers, > Robin [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 190 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-11 10:18 ` NeilBrown @ 2011-04-12 1:57 ` Linux Raid Study 2011-04-12 2:51 ` NeilBrown 0 siblings, 1 reply; 16+ messages in thread From: Linux Raid Study @ 2011-04-12 1:57 UTC (permalink / raw) To: NeilBrown; +Cc: Robin Hill, linux-raid If I use --assume-clean in mdadm, I see performance is 10-15% lower as compared to the case wherein this option is not specified. When I run without --assume_clean, I wait until mdadm prints "recovery_done" and then run IO benchmarks... Is perf drop expected? Thanks. On Mon, Apr 11, 2011 at 3:18 AM, NeilBrown <neilb@suse.de> wrote: > On Mon, 11 Apr 2011 10:53:55 +0100 Robin Hill <robin@robinhill.me.uk> wrote: > >> On Mon Apr 11, 2011 at 02:36:50AM -0700, Linux Raid Study wrote: >> > On Mon, Apr 11, 2011 at 2:25 AM, Robin Hill <robin@robinhill.me.uk> wrote: >> > > On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote: >> > >> On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill <robin@robinhill.me.uk> wrote: >> > >> > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote: >> > >> > >> > >> >> What I'm not sure of is if the device is newly formatted, would raid >> > >> >> recovery happen? What else could explain difference in the first run >> > >> >> of IO benchmark? >> > >> >> >> > >> > When an array is first created, it's created in a degraded state - this >> > >> > is the simplest way to make it available to the user instantly. The >> > >> > final drive(s) are then automatically rebuilt, calculating the >> > >> > parity/data information as normal for recovering a drive. >> > >> > >> > >> Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in >> > >> the iostat output are ok...is that correct? >> > >> >> > > If it hadn't completed the initial recovery, yes. If it _had_ completed >> > > the initial recovery then I'd expect writes to be balanced (barring >> > > any differences in hardware). >> > > >> > The initial recovery should normally be done during first few minutes >> > .... this is a newly formatted disk so there isn't any user data >> > there. So, if I run the IO benchmark after say 3-4 min of doing, I >> > should be ok? >> > >> > mdam --create /dev/md0 --raid5.... >> > mount /dev/md0 /mnt/raid >> > mkfs.ext4 /mnt/raid >> > >> > ...wait 3-4 min >> > >> > run IO benchmark... >> > >> > Am I correct? >> > >> No, depending on the size of the drives, the initial recovery can take >> hours or even days. For RAID5 with N drives, it needs to read the >> entirity of (N-1) drives, and write the entirity of the remaining drive >> (whether there's any data or not, the initial state of the drives is >> unknown so parity data has to be calculated for the entire array). >> >> Check /proc/mdstat and wait until the array has completed resync before >> running any benchmarks. > > or run > mdadm --wait /dev/md0 > > or create the array with --assume-clean. But if the array is raid5, don't > trust the data if a device fails: use this only for testing. > > NeilBrown > > >> >> Cheers, >> Robin > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-12 1:57 ` Linux Raid Study @ 2011-04-12 2:51 ` NeilBrown 2011-04-12 19:36 ` Linux Raid Study 0 siblings, 1 reply; 16+ messages in thread From: NeilBrown @ 2011-04-12 2:51 UTC (permalink / raw) To: Linux Raid Study; +Cc: Robin Hill, linux-raid On Mon, 11 Apr 2011 18:57:34 -0700 Linux Raid Study <linuxraid.study@gmail.com> wrote: > If I use --assume-clean in mdadm, I see performance is 10-15% lower as > compared to the case wherein this option is not specified. When I run > without --assume_clean, I wait until mdadm prints "recovery_done" and > then run IO benchmarks... > > Is perf drop expected? No. And I cannot explain it.... unless the array is so tiny that it all fits in the stripe cache (typically about 1Meg). There really should be no difference. NeilBrown ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-12 2:51 ` NeilBrown @ 2011-04-12 19:36 ` Linux Raid Study 2011-04-13 18:21 ` Linux Raid Study 0 siblings, 1 reply; 16+ messages in thread From: Linux Raid Study @ 2011-04-12 19:36 UTC (permalink / raw) To: NeilBrown; +Cc: Robin Hill, linux-raid Hello Neil, For the benchmarking purpose, I've configured array of ~30GB. stripe_cache_size is 1024 (so 1M). BTW, I'm using Windows copy (robocopy) utility to test perf and I believe block size it uses is 32kB. But since everything gets written thru VFS, I'm not sure how to change stripe_cache_size to get optimal performance with this setup... Thanks. On Mon, Apr 11, 2011 at 7:51 PM, NeilBrown <neilb@suse.de> wrote: > On Mon, 11 Apr 2011 18:57:34 -0700 Linux Raid Study > <linuxraid.study@gmail.com> wrote: > >> If I use --assume-clean in mdadm, I see performance is 10-15% lower as >> compared to the case wherein this option is not specified. When I run >> without --assume_clean, I wait until mdadm prints "recovery_done" and >> then run IO benchmarks... >> >> Is perf drop expected? > > No. And I cannot explain it.... unless the array is so tiny that it all fits > in the stripe cache (typically about 1Meg). > > There really should be no difference. > > NeilBrown > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-12 19:36 ` Linux Raid Study @ 2011-04-13 18:21 ` Linux Raid Study 2011-04-13 21:00 ` NeilBrown 0 siblings, 1 reply; 16+ messages in thread From: Linux Raid Study @ 2011-04-13 18:21 UTC (permalink / raw) To: NeilBrown; +Cc: Robin Hill, linux-raid Let me reword previous email... I tried to change stripe_cache_size as following and tried values between 16 to 4096 echo 512 > /sys/block/md0/md/stripe_cache_size But, I'm not seeing too much difference in performance. I'm running on 2.6.27sh kernel. Any ideas... Thanks for your help... On Tue, Apr 12, 2011 at 12:36 PM, Linux Raid Study <linuxraid.study@gmail.com> wrote: > Hello Neil, > > For the benchmarking purpose, I've configured array of ~30GB. > stripe_cache_size is 1024 (so 1M). > > BTW, I'm using Windows copy (robocopy) utility to test perf and I > believe block size it uses is 32kB. But since everything gets written > thru VFS, I'm not sure how to change stripe_cache_size to get optimal > performance with this setup... > > Thanks. > > On Mon, Apr 11, 2011 at 7:51 PM, NeilBrown <neilb@suse.de> wrote: >> On Mon, 11 Apr 2011 18:57:34 -0700 Linux Raid Study >> <linuxraid.study@gmail.com> wrote: >> >>> If I use --assume-clean in mdadm, I see performance is 10-15% lower as >>> compared to the case wherein this option is not specified. When I run >>> without --assume_clean, I wait until mdadm prints "recovery_done" and >>> then run IO benchmarks... >>> >>> Is perf drop expected? >> >> No. And I cannot explain it.... unless the array is so tiny that it all fits >> in the stripe cache (typically about 1Meg). >> >> There really should be no difference. >> >> NeilBrown >> > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iostat with raid device... 2011-04-13 18:21 ` Linux Raid Study @ 2011-04-13 21:00 ` NeilBrown 0 siblings, 0 replies; 16+ messages in thread From: NeilBrown @ 2011-04-13 21:00 UTC (permalink / raw) To: Linux Raid Study; +Cc: Robin Hill, linux-raid On Wed, 13 Apr 2011 11:21:52 -0700 Linux Raid Study <linuxraid.study@gmail.com> wrote: > Let me reword previous email... > > I tried to change stripe_cache_size as following and tried values > between 16 to 4096 > echo 512 > /sys/block/md0/md/stripe_cache_size > > But, I'm not seeing too much difference in performance. I'm running on > 2.6.27sh kernel. I wouldn't expect much difference. > > Any ideas... On what exactly? What exactly are you doing, what exactly are the results? What exactly don't you understand? Detail help. NeilBrown > > Thanks for your help... > > On Tue, Apr 12, 2011 at 12:36 PM, Linux Raid Study > <linuxraid.study@gmail.com> wrote: > > Hello Neil, > > > > For the benchmarking purpose, I've configured array of ~30GB. > > stripe_cache_size is 1024 (so 1M). > > > > BTW, I'm using Windows copy (robocopy) utility to test perf and I > > believe block size it uses is 32kB. But since everything gets written > > thru VFS, I'm not sure how to change stripe_cache_size to get optimal > > performance with this setup... > > > > Thanks. > > > > On Mon, Apr 11, 2011 at 7:51 PM, NeilBrown <neilb@suse.de> wrote: > >> On Mon, 11 Apr 2011 18:57:34 -0700 Linux Raid Study > >> <linuxraid.study@gmail.com> wrote: > >> > >>> If I use --assume-clean in mdadm, I see performance is 10-15% lower as > >>> compared to the case wherein this option is not specified. When I run > >>> without --assume_clean, I wait until mdadm prints "recovery_done" and > >>> then run IO benchmarks... > >>> > >>> Is perf drop expected? > >> > >> No. And I cannot explain it.... unless the array is so tiny that it all fits > >> in the stripe cache (typically about 1Meg). > >> > >> There really should be no difference. > >> > >> NeilBrown > >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-04-13 21:00 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-08 19:55 iostat with raid device Linux Raid Study 2011-04-08 22:05 ` Roberto Spadim 2011-04-08 22:10 ` Linux Raid Study 2011-04-08 23:46 ` NeilBrown 2011-04-09 0:40 ` Linux Raid Study 2011-04-09 8:50 ` Robin Hill 2011-04-11 8:32 ` Linux Raid Study 2011-04-11 9:25 ` Robin Hill 2011-04-11 9:36 ` Linux Raid Study 2011-04-11 9:53 ` Robin Hill 2011-04-11 10:18 ` NeilBrown 2011-04-12 1:57 ` Linux Raid Study 2011-04-12 2:51 ` NeilBrown 2011-04-12 19:36 ` Linux Raid Study 2011-04-13 18:21 ` Linux Raid Study 2011-04-13 21:00 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).