linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* iostat with raid device...
@ 2011-04-08 19:55 Linux Raid Study
  2011-04-08 22:05 ` Roberto Spadim
  2011-04-08 23:46 ` NeilBrown
  0 siblings, 2 replies; 16+ messages in thread
From: Linux Raid Study @ 2011-04-08 19:55 UTC (permalink / raw)
  To: linux-raid; +Cc: linuxraid.study

Hello,

I have a raid device /dev/md0 based on 4 devices sd[abcd].

When I write 4GB to /dev/md0, I see following output from iostat...

Ques:
Shouldn't I see write/sec to be same for all four drives? Why does
/dev/sdd always have higher value for  BlksWrtn/sec?
My strip size is 1MB.

thanks for any pointers...

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.02    0.00    0.34    0.03    0.00   99.61

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.08       247.77       338.73   37478883   51237136
sda1              1.08       247.77       338.73   37478195   51237136
sdb               1.08       247.73       338.78   37472990   51245712
sdb1              1.08       247.73       338.78   37472302   51245712
sdc               1.10       247.82       338.66   37486670   51226640
sdc1              1.10       247.82       338.66   37485982   51226640
sdd               1.09       118.46       467.97   17918510   70786576
sdd1              1.09       118.45       467.97   17917822   70786576
md0              65.60       443.79      1002.42   67129812  151629440

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-08 19:55 iostat with raid device Linux Raid Study
@ 2011-04-08 22:05 ` Roberto Spadim
  2011-04-08 22:10   ` Linux Raid Study
  2011-04-08 23:46 ` NeilBrown
  1 sibling, 1 reply; 16+ messages in thread
From: Roberto Spadim @ 2011-04-08 22:05 UTC (permalink / raw)
  To: Linux Raid Study; +Cc: linux-raid

another question... why md have more tps? disk elevators? sector size?

2011/4/8 Linux Raid Study <linuxraid.study@gmail.com>:
> Hello,
>
> I have a raid device /dev/md0 based on 4 devices sd[abcd].
>
> When I write 4GB to /dev/md0, I see following output from iostat...
>
> Ques:
> Shouldn't I see write/sec to be same for all four drives? Why does
> /dev/sdd always have higher value for  BlksWrtn/sec?
> My strip size is 1MB.
>
> thanks for any pointers...
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           0.02    0.00    0.34    0.03    0.00   99.61
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               1.08       247.77       338.73   37478883   51237136
> sda1              1.08       247.77       338.73   37478195   51237136
> sdb               1.08       247.73       338.78   37472990   51245712
> sdb1              1.08       247.73       338.78   37472302   51245712
> sdc               1.10       247.82       338.66   37486670   51226640
> sdc1              1.10       247.82       338.66   37485982   51226640
> sdd               1.09       118.46       467.97   17918510   70786576
> sdd1              1.09       118.45       467.97   17917822   70786576
> md0              65.60       443.79      1002.42   67129812  151629440
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-08 22:05 ` Roberto Spadim
@ 2011-04-08 22:10   ` Linux Raid Study
  0 siblings, 0 replies; 16+ messages in thread
From: Linux Raid Study @ 2011-04-08 22:10 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: linux-raid

Thanks for pointing this out...I did observe this but forgot to
mention in the email...

Can someone give some insight into this.

Thanks.

On Fri, Apr 8, 2011 at 3:05 PM, Roberto Spadim <roberto@spadim.com.br> wrote:
> another question... why md have more tps? disk elevators? sector size?
>
> 2011/4/8 Linux Raid Study <linuxraid.study@gmail.com>:
>> Hello,
>>
>> I have a raid device /dev/md0 based on 4 devices sd[abcd].
>>
>> When I write 4GB to /dev/md0, I see following output from iostat...
>>
>> Ques:
>> Shouldn't I see write/sec to be same for all four drives? Why does
>> /dev/sdd always have higher value for  BlksWrtn/sec?
>> My strip size is 1MB.
>>
>> thanks for any pointers...
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>           0.02    0.00    0.34    0.03    0.00   99.61
>>
>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>> sda               1.08       247.77       338.73   37478883   51237136
>> sda1              1.08       247.77       338.73   37478195   51237136
>> sdb               1.08       247.73       338.78   37472990   51245712
>> sdb1              1.08       247.73       338.78   37472302   51245712
>> sdc               1.10       247.82       338.66   37486670   51226640
>> sdc1              1.10       247.82       338.66   37485982   51226640
>> sdd               1.09       118.46       467.97   17918510   70786576
>> sdd1              1.09       118.45       467.97   17917822   70786576
>> md0              65.60       443.79      1002.42   67129812  151629440
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-08 19:55 iostat with raid device Linux Raid Study
  2011-04-08 22:05 ` Roberto Spadim
@ 2011-04-08 23:46 ` NeilBrown
  2011-04-09  0:40   ` Linux Raid Study
  1 sibling, 1 reply; 16+ messages in thread
From: NeilBrown @ 2011-04-08 23:46 UTC (permalink / raw)
  To: Linux Raid Study; +Cc: linux-raid

On Fri, 8 Apr 2011 12:55:39 -0700 Linux Raid Study
<linuxraid.study@gmail.com> wrote:

> Hello,
> 
> I have a raid device /dev/md0 based on 4 devices sd[abcd].

Would this be raid0? raid1? raid5? raid6? raid10?
It could make a difference.

> 
> When I write 4GB to /dev/md0, I see following output from iostat...

Are you writing directly to the /dev/md0, or to a filesystem mounted
from /dev/md0?  It might be easier to explain in the second case, but you
text suggests the first case.

> 
> Ques:
> Shouldn't I see write/sec to be same for all four drives? Why does
> /dev/sdd always have higher value for  BlksWrtn/sec?
> My strip size is 1MB.
> 
> thanks for any pointers...
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.02    0.00    0.34    0.03    0.00   99.61
> 
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               1.08       247.77       338.73   37478883   51237136
> sda1              1.08       247.77       338.73   37478195   51237136
> sdb               1.08       247.73       338.78   37472990   51245712
> sdb1              1.08       247.73       338.78   37472302   51245712
> sdc               1.10       247.82       338.66   37486670   51226640
> sdc1              1.10       247.82       338.66   37485982   51226640
> sdd               1.09       118.46       467.97   17918510   70786576
> sdd1              1.09       118.45       467.97   17917822   70786576
> md0              65.60       443.79      1002.42   67129812  151629440

Doing the sums, for every 2 blocks written to md0 we see 3 blocks written to
some underlying device.  That doesn't make much sense for a 4 drive array.
If we assume that the extra writes to sdd were from some other source, then
It is closer to a 3:4 ratio which suggests raid5.
So I'm guessing that the array is newly created and is recovering the data on
sdd1 at the same time as you are doing the IO test.
This would agree with the observation that sd[abc] see a lot more reads than
sdd.

I'll let you figure out the tps number.... do the math to find out the
average blk/t number for each device.

NeilBrown


> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-08 23:46 ` NeilBrown
@ 2011-04-09  0:40   ` Linux Raid Study
  2011-04-09  8:50     ` Robin Hill
  0 siblings, 1 reply; 16+ messages in thread
From: Linux Raid Study @ 2011-04-09  0:40 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hi Neil,

This is raid5. I have mounted /dev/md0 to /mnt and file system is ext4.

The system is newly created. Steps:
mdadm for raid5
mkfs.ext4 /dev/md0
mount /dev/md0 /mnt/raid
Export /mnt/raid to remote PC using CIFS
Copy file from PC to the mounted drive

An update....
I just ran the test again (without doing reformatting device) and
noticed all 4 HDDs incremented the #ofWritesBlocks equally. This
implies that when raid was configured first time, raid5 was trying to
do its own stuff (recovery)...

What I'm not sure of is if the device is newly formatted, would raid
recovery happen? What else could explain difference in the first run
of IO benchmark?


Thanks.

On Fri, Apr 8, 2011 at 4:46 PM, NeilBrown <neilb@suse.de> wrote:
> On Fri, 8 Apr 2011 12:55:39 -0700 Linux Raid Study
> <linuxraid.study@gmail.com> wrote:
>
>> Hello,
>>
>> I have a raid device /dev/md0 based on 4 devices sd[abcd].
>
> Would this be raid0? raid1? raid5? raid6? raid10?
> It could make a difference.
>
>>
>> When I write 4GB to /dev/md0, I see following output from iostat...
>
> Are you writing directly to the /dev/md0, or to a filesystem mounted
> from /dev/md0?  It might be easier to explain in the second case, but you
> text suggests the first case.
>
>>
>> Ques:
>> Shouldn't I see write/sec to be same for all four drives? Why does
>> /dev/sdd always have higher value for  BlksWrtn/sec?
>> My strip size is 1MB.
>>
>> thanks for any pointers...
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>            0.02    0.00    0.34    0.03    0.00   99.61
>>
>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>> sda               1.08       247.77       338.73   37478883   51237136
>> sda1              1.08       247.77       338.73   37478195   51237136
>> sdb               1.08       247.73       338.78   37472990   51245712
>> sdb1              1.08       247.73       338.78   37472302   51245712
>> sdc               1.10       247.82       338.66   37486670   51226640
>> sdc1              1.10       247.82       338.66   37485982   51226640
>> sdd               1.09       118.46       467.97   17918510   70786576
>> sdd1              1.09       118.45       467.97   17917822   70786576
>> md0              65.60       443.79      1002.42   67129812  151629440
>
> Doing the sums, for every 2 blocks written to md0 we see 3 blocks written to
> some underlying device.  That doesn't make much sense for a 4 drive array.
> If we assume that the extra writes to sdd were from some other source, then
> It is closer to a 3:4 ratio which suggests raid5.
> So I'm guessing that the array is newly created and is recovering the data on
> sdd1 at the same time as you are doing the IO test.
> This would agree with the observation that sd[abc] see a lot more reads than
> sdd.
>
> I'll let you figure out the tps number.... do the math to find out the
> average blk/t number for each device.
>
> NeilBrown
>
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-09  0:40   ` Linux Raid Study
@ 2011-04-09  8:50     ` Robin Hill
  2011-04-11  8:32       ` Linux Raid Study
  0 siblings, 1 reply; 16+ messages in thread
From: Robin Hill @ 2011-04-09  8:50 UTC (permalink / raw)
  To: Linux Raid Study; +Cc: NeilBrown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 747 bytes --]

On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote:

> What I'm not sure of is if the device is newly formatted, would raid
> recovery happen? What else could explain difference in the first run
> of IO benchmark?
> 
When an array is first created, it's created in a degraded state - this
is the simplest way to make it available to the user instantly. The
final drive(s) are then automatically rebuilt, calculating the
parity/data information as normal for recovering a drive.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-09  8:50     ` Robin Hill
@ 2011-04-11  8:32       ` Linux Raid Study
  2011-04-11  9:25         ` Robin Hill
  0 siblings, 1 reply; 16+ messages in thread
From: Linux Raid Study @ 2011-04-11  8:32 UTC (permalink / raw)
  To: Linux Raid Study, NeilBrown, linux-raid

Hi Robin,

Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in
the iostat output are ok...is that correct?

Thanks.

On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill <robin@robinhill.me.uk> wrote:
> On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote:
>
>> What I'm not sure of is if the device is newly formatted, would raid
>> recovery happen? What else could explain difference in the first run
>> of IO benchmark?
>>
> When an array is first created, it's created in a degraded state - this
> is the simplest way to make it available to the user instantly. The
> final drive(s) are then automatically rebuilt, calculating the
> parity/data information as normal for recovering a drive.
>
> Cheers,
>    Robin
> --
>     ___
>    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
>   / / )      | Little Jim says ....                            |
>  // !!       |      "He fallen in de water !!"                 |
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-11  8:32       ` Linux Raid Study
@ 2011-04-11  9:25         ` Robin Hill
  2011-04-11  9:36           ` Linux Raid Study
  0 siblings, 1 reply; 16+ messages in thread
From: Robin Hill @ 2011-04-11  9:25 UTC (permalink / raw)
  To: Linux Raid Study; +Cc: NeilBrown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1225 bytes --]

On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote:
> On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill <robin@robinhill.me.uk> wrote:
> > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote:
> >
> >> What I'm not sure of is if the device is newly formatted, would raid
> >> recovery happen? What else could explain difference in the first run
> >> of IO benchmark?
> >>
> > When an array is first created, it's created in a degraded state - this
> > is the simplest way to make it available to the user instantly. The
> > final drive(s) are then automatically rebuilt, calculating the
> > parity/data information as normal for recovering a drive.
> >
> Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in
> the iostat output are ok...is that correct?
> 
If it hadn't completed the initial recovery, yes.  If it _had_ completed
the initial recovery then I'd expect writes to be balanced (barring
any differences in hardware).

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-11  9:25         ` Robin Hill
@ 2011-04-11  9:36           ` Linux Raid Study
  2011-04-11  9:53             ` Robin Hill
  0 siblings, 1 reply; 16+ messages in thread
From: Linux Raid Study @ 2011-04-11  9:36 UTC (permalink / raw)
  To: Linux Raid Study, NeilBrown, linux-raid; +Cc: Robin Hill

The initial recovery should normally be done during first few minutes
.... this is a newly formatted disk so there isn't any user data
there. So, if I run the IO benchmark after say 3-4 min of doing, I
should be ok?

mdam --create /dev/md0 --raid5....
mount /dev/md0 /mnt/raid
mkfs.ext4 /mnt/raid

...wait 3-4 min

run IO benchmark...

Am I correct?

Thanks.

On Mon, Apr 11, 2011 at 2:25 AM, Robin Hill <robin@robinhill.me.uk> wrote:
> On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote:
>> On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill <robin@robinhill.me.uk> wrote:
>> > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote:
>> >
>> >> What I'm not sure of is if the device is newly formatted, would raid
>> >> recovery happen? What else could explain difference in the first run
>> >> of IO benchmark?
>> >>
>> > When an array is first created, it's created in a degraded state - this
>> > is the simplest way to make it available to the user instantly. The
>> > final drive(s) are then automatically rebuilt, calculating the
>> > parity/data information as normal for recovering a drive.
>> >
>> Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in
>> the iostat output are ok...is that correct?
>>
> If it hadn't completed the initial recovery, yes.  If it _had_ completed
> the initial recovery then I'd expect writes to be balanced (barring
> any differences in hardware).
>
> Cheers,
>    Robin
> --
>     ___
>    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
>   / / )      | Little Jim says ....                            |
>  // !!       |      "He fallen in de water !!"                 |
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-11  9:36           ` Linux Raid Study
@ 2011-04-11  9:53             ` Robin Hill
  2011-04-11 10:18               ` NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: Robin Hill @ 2011-04-11  9:53 UTC (permalink / raw)
  To: Linux Raid Study; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2280 bytes --]

On Mon Apr 11, 2011 at 02:36:50AM -0700, Linux Raid Study wrote:
> On Mon, Apr 11, 2011 at 2:25 AM, Robin Hill <robin@robinhill.me.uk> wrote:
> > On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote:
> >> On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill <robin@robinhill.me.uk> wrote:
> >> > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote:
> >> >
> >> >> What I'm not sure of is if the device is newly formatted, would raid
> >> >> recovery happen? What else could explain difference in the first run
> >> >> of IO benchmark?
> >> >>
> >> > When an array is first created, it's created in a degraded state - this
> >> > is the simplest way to make it available to the user instantly. The
> >> > final drive(s) are then automatically rebuilt, calculating the
> >> > parity/data information as normal for recovering a drive.
> >> >
> >> Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in
> >> the iostat output are ok...is that correct?
> >>
> > If it hadn't completed the initial recovery, yes.  If it _had_ completed
> > the initial recovery then I'd expect writes to be balanced (barring
> > any differences in hardware).
> >
> The initial recovery should normally be done during first few minutes
> .... this is a newly formatted disk so there isn't any user data
> there. So, if I run the IO benchmark after say 3-4 min of doing, I
> should be ok?
> 
> mdam --create /dev/md0 --raid5....
> mount /dev/md0 /mnt/raid
> mkfs.ext4 /mnt/raid
> 
> ...wait 3-4 min
> 
> run IO benchmark...
> 
> Am I correct?
> 
No, depending on the size of the drives, the initial recovery can take
hours or even days. For RAID5 with N drives, it needs to read the
entirity of (N-1) drives, and write the entirity of the remaining drive
(whether there's any data or not, the initial state of the drives is
unknown so parity data has to be calculated for the entire array).

Check /proc/mdstat and wait until the array has completed resync before
running any benchmarks.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-11  9:53             ` Robin Hill
@ 2011-04-11 10:18               ` NeilBrown
  2011-04-12  1:57                 ` Linux Raid Study
  0 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2011-04-11 10:18 UTC (permalink / raw)
  To: Robin Hill; +Cc: Linux Raid Study, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2427 bytes --]

On Mon, 11 Apr 2011 10:53:55 +0100 Robin Hill <robin@robinhill.me.uk> wrote:

> On Mon Apr 11, 2011 at 02:36:50AM -0700, Linux Raid Study wrote:
> > On Mon, Apr 11, 2011 at 2:25 AM, Robin Hill <robin@robinhill.me.uk> wrote:
> > > On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote:
> > >> On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill <robin@robinhill.me.uk> wrote:
> > >> > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote:
> > >> >
> > >> >> What I'm not sure of is if the device is newly formatted, would raid
> > >> >> recovery happen? What else could explain difference in the first run
> > >> >> of IO benchmark?
> > >> >>
> > >> > When an array is first created, it's created in a degraded state - this
> > >> > is the simplest way to make it available to the user instantly. The
> > >> > final drive(s) are then automatically rebuilt, calculating the
> > >> > parity/data information as normal for recovering a drive.
> > >> >
> > >> Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in
> > >> the iostat output are ok...is that correct?
> > >>
> > > If it hadn't completed the initial recovery, yes.  If it _had_ completed
> > > the initial recovery then I'd expect writes to be balanced (barring
> > > any differences in hardware).
> > >
> > The initial recovery should normally be done during first few minutes
> > .... this is a newly formatted disk so there isn't any user data
> > there. So, if I run the IO benchmark after say 3-4 min of doing, I
> > should be ok?
> > 
> > mdam --create /dev/md0 --raid5....
> > mount /dev/md0 /mnt/raid
> > mkfs.ext4 /mnt/raid
> > 
> > ...wait 3-4 min
> > 
> > run IO benchmark...
> > 
> > Am I correct?
> > 
> No, depending on the size of the drives, the initial recovery can take
> hours or even days. For RAID5 with N drives, it needs to read the
> entirity of (N-1) drives, and write the entirity of the remaining drive
> (whether there's any data or not, the initial state of the drives is
> unknown so parity data has to be calculated for the entire array).
> 
> Check /proc/mdstat and wait until the array has completed resync before
> running any benchmarks.

or run
  mdadm --wait /dev/md0

or create the array with --assume-clean.  But if the array is raid5, don't
trust the data if a device fails:  use this only for testing.

NeilBrown


> 
> Cheers,
>     Robin


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-11 10:18               ` NeilBrown
@ 2011-04-12  1:57                 ` Linux Raid Study
  2011-04-12  2:51                   ` NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: Linux Raid Study @ 2011-04-12  1:57 UTC (permalink / raw)
  To: NeilBrown; +Cc: Robin Hill, linux-raid

If I use --assume-clean in mdadm, I see performance is 10-15% lower as
compared to the case wherein this option is not specified. When I run
without --assume_clean, I wait until mdadm prints "recovery_done" and
then run IO benchmarks...

Is perf drop expected?

Thanks.

On Mon, Apr 11, 2011 at 3:18 AM, NeilBrown <neilb@suse.de> wrote:
> On Mon, 11 Apr 2011 10:53:55 +0100 Robin Hill <robin@robinhill.me.uk> wrote:
>
>> On Mon Apr 11, 2011 at 02:36:50AM -0700, Linux Raid Study wrote:
>> > On Mon, Apr 11, 2011 at 2:25 AM, Robin Hill <robin@robinhill.me.uk> wrote:
>> > > On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote:
>> > >> On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill <robin@robinhill.me.uk> wrote:
>> > >> > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote:
>> > >> >
>> > >> >> What I'm not sure of is if the device is newly formatted, would raid
>> > >> >> recovery happen? What else could explain difference in the first run
>> > >> >> of IO benchmark?
>> > >> >>
>> > >> > When an array is first created, it's created in a degraded state - this
>> > >> > is the simplest way to make it available to the user instantly. The
>> > >> > final drive(s) are then automatically rebuilt, calculating the
>> > >> > parity/data information as normal for recovering a drive.
>> > >> >
>> > >> Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in
>> > >> the iostat output are ok...is that correct?
>> > >>
>> > > If it hadn't completed the initial recovery, yes.  If it _had_ completed
>> > > the initial recovery then I'd expect writes to be balanced (barring
>> > > any differences in hardware).
>> > >
>> > The initial recovery should normally be done during first few minutes
>> > .... this is a newly formatted disk so there isn't any user data
>> > there. So, if I run the IO benchmark after say 3-4 min of doing, I
>> > should be ok?
>> >
>> > mdam --create /dev/md0 --raid5....
>> > mount /dev/md0 /mnt/raid
>> > mkfs.ext4 /mnt/raid
>> >
>> > ...wait 3-4 min
>> >
>> > run IO benchmark...
>> >
>> > Am I correct?
>> >
>> No, depending on the size of the drives, the initial recovery can take
>> hours or even days. For RAID5 with N drives, it needs to read the
>> entirity of (N-1) drives, and write the entirity of the remaining drive
>> (whether there's any data or not, the initial state of the drives is
>> unknown so parity data has to be calculated for the entire array).
>>
>> Check /proc/mdstat and wait until the array has completed resync before
>> running any benchmarks.
>
> or run
>  mdadm --wait /dev/md0
>
> or create the array with --assume-clean.  But if the array is raid5, don't
> trust the data if a device fails:  use this only for testing.
>
> NeilBrown
>
>
>>
>> Cheers,
>>     Robin
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-12  1:57                 ` Linux Raid Study
@ 2011-04-12  2:51                   ` NeilBrown
  2011-04-12 19:36                     ` Linux Raid Study
  0 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2011-04-12  2:51 UTC (permalink / raw)
  To: Linux Raid Study; +Cc: Robin Hill, linux-raid

On Mon, 11 Apr 2011 18:57:34 -0700 Linux Raid Study
<linuxraid.study@gmail.com> wrote:

> If I use --assume-clean in mdadm, I see performance is 10-15% lower as
> compared to the case wherein this option is not specified. When I run
> without --assume_clean, I wait until mdadm prints "recovery_done" and
> then run IO benchmarks...
> 
> Is perf drop expected?

No.  And I cannot explain it.... unless the array is so tiny that it all fits
in the stripe cache (typically about 1Meg).

There really should be no difference.

NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-12  2:51                   ` NeilBrown
@ 2011-04-12 19:36                     ` Linux Raid Study
  2011-04-13 18:21                       ` Linux Raid Study
  0 siblings, 1 reply; 16+ messages in thread
From: Linux Raid Study @ 2011-04-12 19:36 UTC (permalink / raw)
  To: NeilBrown; +Cc: Robin Hill, linux-raid

Hello Neil,

For the benchmarking purpose, I've configured array of ~30GB.
stripe_cache_size is 1024 (so 1M).

BTW, I'm using Windows copy (robocopy) utility to test perf and I
believe block size it uses is 32kB. But since everything gets written
thru VFS, I'm not sure how to change stripe_cache_size to get optimal
performance with this setup...

Thanks.

On Mon, Apr 11, 2011 at 7:51 PM, NeilBrown <neilb@suse.de> wrote:
> On Mon, 11 Apr 2011 18:57:34 -0700 Linux Raid Study
> <linuxraid.study@gmail.com> wrote:
>
>> If I use --assume-clean in mdadm, I see performance is 10-15% lower as
>> compared to the case wherein this option is not specified. When I run
>> without --assume_clean, I wait until mdadm prints "recovery_done" and
>> then run IO benchmarks...
>>
>> Is perf drop expected?
>
> No.  And I cannot explain it.... unless the array is so tiny that it all fits
> in the stripe cache (typically about 1Meg).
>
> There really should be no difference.
>
> NeilBrown
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-12 19:36                     ` Linux Raid Study
@ 2011-04-13 18:21                       ` Linux Raid Study
  2011-04-13 21:00                         ` NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: Linux Raid Study @ 2011-04-13 18:21 UTC (permalink / raw)
  To: NeilBrown; +Cc: Robin Hill, linux-raid

Let me reword previous email...

I tried to change stripe_cache_size as following and tried values
between 16 to 4096
echo 512 > /sys/block/md0/md/stripe_cache_size

But, I'm not seeing too much difference in performance. I'm running on
2.6.27sh kernel.

Any ideas...

Thanks for your help...

On Tue, Apr 12, 2011 at 12:36 PM, Linux Raid Study
<linuxraid.study@gmail.com> wrote:
> Hello Neil,
>
> For the benchmarking purpose, I've configured array of ~30GB.
> stripe_cache_size is 1024 (so 1M).
>
> BTW, I'm using Windows copy (robocopy) utility to test perf and I
> believe block size it uses is 32kB. But since everything gets written
> thru VFS, I'm not sure how to change stripe_cache_size to get optimal
> performance with this setup...
>
> Thanks.
>
> On Mon, Apr 11, 2011 at 7:51 PM, NeilBrown <neilb@suse.de> wrote:
>> On Mon, 11 Apr 2011 18:57:34 -0700 Linux Raid Study
>> <linuxraid.study@gmail.com> wrote:
>>
>>> If I use --assume-clean in mdadm, I see performance is 10-15% lower as
>>> compared to the case wherein this option is not specified. When I run
>>> without --assume_clean, I wait until mdadm prints "recovery_done" and
>>> then run IO benchmarks...
>>>
>>> Is perf drop expected?
>>
>> No.  And I cannot explain it.... unless the array is so tiny that it all fits
>> in the stripe cache (typically about 1Meg).
>>
>> There really should be no difference.
>>
>> NeilBrown
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: iostat with raid device...
  2011-04-13 18:21                       ` Linux Raid Study
@ 2011-04-13 21:00                         ` NeilBrown
  0 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2011-04-13 21:00 UTC (permalink / raw)
  To: Linux Raid Study; +Cc: Robin Hill, linux-raid

On Wed, 13 Apr 2011 11:21:52 -0700 Linux Raid Study
<linuxraid.study@gmail.com> wrote:

> Let me reword previous email...
> 
> I tried to change stripe_cache_size as following and tried values
> between 16 to 4096
> echo 512 > /sys/block/md0/md/stripe_cache_size
> 
> But, I'm not seeing too much difference in performance. I'm running on
> 2.6.27sh kernel.

I wouldn't expect much difference.

> 
> Any ideas...

On what exactly?
What exactly are you doing, what exactly are the results?  What exactly don't
you understand?

Detail help.

NeilBrown



> 
> Thanks for your help...
> 
> On Tue, Apr 12, 2011 at 12:36 PM, Linux Raid Study
> <linuxraid.study@gmail.com> wrote:
> > Hello Neil,
> >
> > For the benchmarking purpose, I've configured array of ~30GB.
> > stripe_cache_size is 1024 (so 1M).
> >
> > BTW, I'm using Windows copy (robocopy) utility to test perf and I
> > believe block size it uses is 32kB. But since everything gets written
> > thru VFS, I'm not sure how to change stripe_cache_size to get optimal
> > performance with this setup...
> >
> > Thanks.
> >
> > On Mon, Apr 11, 2011 at 7:51 PM, NeilBrown <neilb@suse.de> wrote:
> >> On Mon, 11 Apr 2011 18:57:34 -0700 Linux Raid Study
> >> <linuxraid.study@gmail.com> wrote:
> >>
> >>> If I use --assume-clean in mdadm, I see performance is 10-15% lower as
> >>> compared to the case wherein this option is not specified. When I run
> >>> without --assume_clean, I wait until mdadm prints "recovery_done" and
> >>> then run IO benchmarks...
> >>>
> >>> Is perf drop expected?
> >>
> >> No.  And I cannot explain it.... unless the array is so tiny that it all fits
> >> in the stripe cache (typically about 1Meg).
> >>
> >> There really should be no difference.
> >>
> >> NeilBrown
> >>
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-04-13 21:00 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-08 19:55 iostat with raid device Linux Raid Study
2011-04-08 22:05 ` Roberto Spadim
2011-04-08 22:10   ` Linux Raid Study
2011-04-08 23:46 ` NeilBrown
2011-04-09  0:40   ` Linux Raid Study
2011-04-09  8:50     ` Robin Hill
2011-04-11  8:32       ` Linux Raid Study
2011-04-11  9:25         ` Robin Hill
2011-04-11  9:36           ` Linux Raid Study
2011-04-11  9:53             ` Robin Hill
2011-04-11 10:18               ` NeilBrown
2011-04-12  1:57                 ` Linux Raid Study
2011-04-12  2:51                   ` NeilBrown
2011-04-12 19:36                     ` Linux Raid Study
2011-04-13 18:21                       ` Linux Raid Study
2011-04-13 21:00                         ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).