Raid sync observations

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Raid sync observations
@ 2005-12-20 17:18 Sebastian Kuzminsky
  2005-12-20 18:14 ` Sebastian Kuzminsky
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Sebastian Kuzminsky @ 2005-12-20 17:18 UTC (permalink / raw)
  To: linux-raid

I just created a RAID array (4-disk RAID-6).  When "mdadm -C" returned,
/proc/mdstat showed it syncing the new array at about 17 MB/s.  "vmstat 1"
showed hardly any blocks in or out, and an almost completely idle cpu.

Question 1: Why didnt the raid sync I/O show up with vmstat?

Question 2: Why was it limited to 17 MB per second?  The maximum was
left at the default, 200 MB/s.  The min was also at the default, 1 MB/s.
I get 60 MB/s per disk with "hdparm -tT" (that's using one disk at a time,
but still).  The checksumming code does > 3 GB/s.

Just curious really.  This is with 2.6.15-rc5 and SATA disks via libata.

-- 
Sebastian Kuzminsky

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Raid sync observations
  2005-12-20 17:18 Raid sync observations Sebastian Kuzminsky
@ 2005-12-20 18:14 ` Sebastian Kuzminsky
  2005-12-20 21:27 ` Neil Brown
  2005-12-21  1:55 ` Christopher Smith
  2 siblings, 0 replies; 12+ messages in thread
From: Sebastian Kuzminsky @ 2005-12-20 18:14 UTC (permalink / raw)
  To: linux-raid

Sebastian Kuzminsky <seb@highlab.com> wrote:
> Question 1: Why didnt the raid sync I/O show up with vmstat?
> 
> Question 2: Why was it limited to 17 MB per second?  The maximum was
> left at the default, 200 MB/s.  The min was also at the default, 1 MB/s.
> I get 60 MB/s per disk with "hdparm -tT" (that's using one disk at a time,
> but still).  The checksumming code does > 3 GB/s.

Some more info...  vmstat doesnt see it, but "iostat -m" does:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    3.96    0.00    0.00   96.04
Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
hda               0.00         0.00         0.00          0          0
hde               0.00         0.00         0.00          0          0
hdg               0.00         0.00         0.00          0          0
hdi               0.00         0.00         0.00          0          0
hdk               0.00         0.00         0.00          0          0
sda             201.00        18.06         8.56         18          8
sdb             184.00        18.06         8.62         18          8
sdc             209.00        18.06         8.59         18          8
sdd             151.00        17.16         9.00         17          9
md0               0.00         0.00         0.00          0          0
md1               0.00         0.00         0.00          0          0


hda is the system disk, hd[egik] are PATA disks that make up md0, and
sd[abcd] are SATA disks that make up md1.  md0 is idle, md1 is syncing.

This all makes more sense now - it's only getting 18 MB/s because it's
spending all that time writing.

But wait, why is it only writing half as much as it reads?  This is a
4-disk RAID-6, as I understand it, it should read 2 strips and write 2
strips per stripe.


-- 
Sebastian Kuzminsky

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Raid sync observations
@ 2005-12-20 18:27 Andrew Burgess
  2005-12-20 19:13 ` Sebastian Kuzminsky
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Burgess @ 2005-12-20 18:27 UTC (permalink / raw)
  To: linux-raid

>I just created a RAID array (4-disk RAID-6).  When "mdadm -C" returned,
>/proc/mdstat showed it syncing the new array at about 17 MB/s.  "vmstat 1"
>showed hardly any blocks in or out, and an almost completely idle cpu.

>Question 1: Why didnt the raid sync I/O show up with vmstat?

I switched to iostat because of similar observations with vmstat. iostat
at least shows you which devices it is looking at and it agrees with
/proc/mdstat's numbers in my experience.

>Question 2: Why was it limited to 17 MB per second?  The maximum was
>left at the default, 200 MB/s.  The min was also at the default, 1 MB/s.
>I get 60 MB/s per disk with "hdparm -tT" (that's using one disk at a time,
>but still).  The checksumming code does > 3 GB/s.

Try using dd to read each device in parallel as it might be a bus or controller
limitation. Also, sync requires writing interspersed with the reads which
unavoidably ruins the total throughput. Larger stripes should minimize this if
you think it'll be a problem during everyday use.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Raid sync observations
  2005-12-20 18:27 Andrew Burgess
@ 2005-12-20 19:13 ` Sebastian Kuzminsky
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastian Kuzminsky @ 2005-12-20 19:13 UTC (permalink / raw)
  To: linux-raid

Andrew Burgess <aab@cichlid.com> wrote:
> >Question 1: Why didnt the raid sync I/O show up with vmstat?
> 
> I switched to iostat because of similar observations with vmstat. iostat
> at least shows you which devices it is looking at and it agrees with
> /proc/mdstat's numbers in my experience.

Right.

> >Question 2: Why was it limited to 17 MB per second?  The maximum was
> >left at the default, 200 MB/s.  The min was also at the default, 1 MB/s.
> >I get 60 MB/s per disk with "hdparm -tT" (that's using one disk at a time,
> >but still).  The checksumming code does > 3 GB/s.
> 
> Try using dd to read each device in parallel as it might be a bus or controller
> limitation. Also, sync requires writing interspersed with the reads which
> unavoidably ruins the total throughput. Larger stripes should minimize this if
> you think it'll be a problem during everyday use.

Before creating the raid, I ran "badblocks -n" (non-destructive read-write
mode) on all 4 disks in parallel, and I was getting about 14 MB/s read &
14 MB/s write, per disk, with iostat reporting device utilizations around
97% on all disks.

"badblocks" (read-only test) on all 4 in parallel reads ~56 MB/s per disk,
call it ~220 MB/s total, with device utilizations again up around 98%.

Not bad!  I'll just call it sync access pattern overhead then.

The disks are Seagate Barracuda 7200.9 500 GB SATA-II.  The controller
card is a Sonnet Tempo-X 8 SATA, based on the Marvell 6081 chipset.
It's a 64-bit/133-MHz PCI-X card plugged in to a 64-bit/66-MHz PCI-X
slot, so the limiting factor is the bus, which runs at about 500 MB/s.
I should be able to hang eight 60 MB/s disks off this card and still
not run into any I/O bottlenecks.  :-)

-- 
Sebastian Kuzminsky

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Raid sync observations
  2005-12-20 17:18 Raid sync observations Sebastian Kuzminsky
  2005-12-20 18:14 ` Sebastian Kuzminsky
@ 2005-12-20 21:27 ` Neil Brown
  2005-12-20 21:33   ` Sebastian Kuzminsky
  2005-12-21  1:55 ` Christopher Smith
  2 siblings, 1 reply; 12+ messages in thread
From: Neil Brown @ 2005-12-20 21:27 UTC (permalink / raw)
  To: Sebastian Kuzminsky; +Cc: linux-raid

On Tuesday December 20, seb@highlab.com wrote:
> I just created a RAID array (4-disk RAID-6).  When "mdadm -C" returned,
> /proc/mdstat showed it syncing the new array at about 17 MB/s.  "vmstat 1"
> showed hardly any blocks in or out, and an almost completely idle cpu.
> 
> 
> Question 1: Why didnt the raid sync I/O show up with vmstat?

Because resync does use the 'vm' subsystem.  It uses a private memory
pool.  So it doesn't page-in or page-out.
As has been noted, the IO it appears in iostat.

> 
> 
> Question 2: Why was it limited to 17 MB per second?  The maximum was
> left at the default, 200 MB/s.  The min was also at the default, 1 MB/s.
> I get 60 MB/s per disk with "hdparm -tT" (that's using one disk at a time,
> but still).  The checksumming code does > 3 GB/s.
> 

This has been at least partially answered already, however...

 When doing resync, raid6 will read all the data blocks, calculate the
 P and Q syndromes, and then write those out.  
 If you think about head movement remembering that the devices holding
 P and Q differs for each stripe, you will realise that the drives are
 not streaming, so you will miss out on some speed.

 You can get better speed on creation with
   mdadm -C -l6 -n4 -x2 /dev/sd[ab] missing missing /dev/sd[cd]

 i.e. create with 2 missing devices and 2 spares.  Recovery onto the
 spares will do purely sequential IO on all devices and so will go
 much faster.
 Doing this requires regenerating datablocks which is more cpu
 intensive that generating P and Q, so there might be some CPU
 overhead, but the disk throughput is still much faster.

 In 2.6.16, raid6 resync has been improved somewhat.  This will not
 affect the initial sync, but resync after a crash will only write
 P/Q blocks which are wrong and there are usually very few of those.
 This means that it will mostly to sequential reads on all drives, so
 you get full device speed.

NeilBrown

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Raid sync observations
  2005-12-20 21:27 ` Neil Brown
@ 2005-12-20 21:33   ` Sebastian Kuzminsky
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastian Kuzminsky @ 2005-12-20 21:33 UTC (permalink / raw)
  To: linux-raid

Neil Brown <neilb@suse.de> wrote:
>  You can get better speed on creation with
>    mdadm -C -l6 -n4 -x2 /dev/sd[ab] missing missing /dev/sd[cd]
> 
>  i.e. create with 2 missing devices and 2 spares.  Recovery onto the
>  spares will do purely sequential IO on all devices and so will go
>  much faster.
>  Doing this requires regenerating datablocks which is more cpu
>  intensive that generating P and Q, so there might be some CPU
>  overhead, but the disk throughput is still much faster.
> 
>  In 2.6.16, raid6 resync has been improved somewhat.  This will not
>  affect the initial sync, but resync after a crash will only write
>  P/Q blocks which are wrong and there are usually very few of those.
>  This means that it will mostly to sequential reads on all drives, so
>  you get full device speed.

Thanks!!


-- 
Sebastian Kuzminsky

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Raid sync observations
@ 2005-12-20 22:09 Jeff Breidenbach
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Breidenbach @ 2005-12-20 22:09 UTC (permalink / raw)
  To: linux-raid

>I'll just call it sync access pattern overhead then.

As another data point, I've been adding more and more
drives to a RAID-1 array. Yesterday I just added a fourth
disk which is still syncing.

       mdadm --grow /dev/md0 -n4
       mdadm --manage /dev/md0 --add /dev/sde

md0 : active raid1 sde1[4] sdc1[0] sdb1[2] sdd1[1]
      488383936 blocks [4/3] [UUU_]
      [=====>...............]  recovery = 27.6% (134963968/488383936) finish=1739.7min speed=3384K/sec

In general I'm seeing somewhere between 2 to 4 MB/s on the currently
running sync. Which is fine, no problem. The array is live and
currently handling some traffic. But these are pretty fast disks, and
running simple things like

	hdparm -tT /dev/sde
	nice dd_rescue /dev/md0 /dev/null

all show pretty big numbers. This provides some intuition that there
may be room for improvement in sync speed. Kernel 2.6.14, no fancy
RAID bitmaps involved.

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Raid sync observations
  2005-12-20 17:18 Raid sync observations Sebastian Kuzminsky
  2005-12-20 18:14 ` Sebastian Kuzminsky
  2005-12-20 21:27 ` Neil Brown
@ 2005-12-21  1:55 ` Christopher Smith
  2005-12-21 11:49   ` Andy Smith
  2 siblings, 1 reply; 12+ messages in thread
From: Christopher Smith @ 2005-12-21  1:55 UTC (permalink / raw)
  To: Sebastian Kuzminsky; +Cc: linux-raid

Sebastian Kuzminsky wrote:
> I just created a RAID array (4-disk RAID-6).  When "mdadm -C" returned,
> /proc/mdstat showed it syncing the new array at about 17 MB/s.  "vmstat 1"
> showed hardly any blocks in or out, and an almost completely idle cpu.

This isn't really relevant to your questions but...

Why would you use RAID6 and not RAID10 with four disks ?

CS

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Raid sync observations
  2005-12-21  1:55 ` Christopher Smith
@ 2005-12-21 11:49   ` Andy Smith
  2005-12-21 17:02     ` Sebastian Kuzminsky
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Smith @ 2005-12-21 11:49 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 751 bytes --]

On Wed, Dec 21, 2005 at 12:55:47PM +1100, Christopher Smith wrote:
> Sebastian Kuzminsky wrote:
> >I just created a RAID array (4-disk RAID-6).  When "mdadm -C" returned,
> >/proc/mdstat showed it syncing the new array at about 17 MB/s.  "vmstat 1"
> >showed hardly any blocks in or out, and an almost completely idle cpu.
> 
> This isn't really relevant to your questions but...
> 
> Why would you use RAID6 and not RAID10 with four disks ?

I was wondering the same thing.  It's true that RAID6 is guaranteed
to still run degraded after losing 2 devices, whereas a RAID10 on 4
devices could only lose 1 device from each RAID1.  So there is some
small extra redundancy there.

But how does the performance for read and write compare?

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Raid sync observations
  2005-12-21 11:49   ` Andy Smith
@ 2005-12-21 17:02     ` Sebastian Kuzminsky
  2005-12-21 17:13       ` Gordon Henderson
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Kuzminsky @ 2005-12-21 17:02 UTC (permalink / raw)
  To: linux-raid

Andy Smith <andy@lug.org.uk> wrote:
> On Wed, Dec 21, 2005 at 12:55:47PM +1100, Christopher Smith wrote:
> > Why would you use RAID6 and not RAID10 with four disks ?
> 
> I was wondering the same thing.  It's true that RAID6 is guaranteed
> to still run degraded after losing 2 devices, whereas a RAID10 on 4
> devices could only lose 1 device from each RAID1.  So there is some
> small extra redundancy there.

That's the reason - better reliability.  With 4-disk RAID-10, a 2-disk
failure has a 1/3 chance of causing array failure.  With RAID-6, there
is no chance.


> But how does the performance for read and write compare?

Good question!  I'll post some performance numbers of the RAID-6
configuration when I have it up and running.


-- 
Sebastian Kuzminsky

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Raid sync observations
  2005-12-21 17:02     ` Sebastian Kuzminsky
@ 2005-12-21 17:13       ` Gordon Henderson
  2006-01-08 18:18         ` Bill Davidsen
  0 siblings, 1 reply; 12+ messages in thread
From: Gordon Henderson @ 2005-12-21 17:13 UTC (permalink / raw)
  To: Sebastian Kuzminsky; +Cc: linux-raid

On Wed, 21 Dec 2005, Sebastian Kuzminsky wrote:

> > But how does the performance for read and write compare?
>
> Good question!  I'll post some performance numbers of the RAID-6
> configuration when I have it up and running.

Post your hardware config too if you don't mind. I have one server with 8
drives and for swap (Which it never does!) I created 2 x 4 disk RAID 6
arrays (same partition on all disks) and gave them to the kernel with
equal priority

Filename                                Type            Size    Used    Priority
/dev/md10                               partition       1991800 0       1
/dev/md11                               partition       1991800 0       1

md10 : active raid6 sdd2[3] sdc2[2] sdb2[1] sda2[0]
      1991808 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]

md11 : active raid6 sdh2[3] sdg2[2] sdf2[1] sde2[0]
      1991808 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]

/dev/md10:
 Timing buffered disk reads:  64 MB in  0.66 seconds = 97.28 MB/sec

/dev/md11:
 Timing buffered disk reads:  64 MB in  0.95 seconds = 67.59 MB/sec

md10 is an on-board 4-port SII SATA controller, md11 is 2 x 2-port SII
PCI cards. (Server is currently moderately loaded, so results are a bit
lower than usual

Cue the must/must not swap on RAID arguments ;-)

Happy Solstice!

Gordon


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Raid sync observations
  2005-12-21 17:13       ` Gordon Henderson
@ 2006-01-08 18:18         ` Bill Davidsen
  0 siblings, 0 replies; 12+ messages in thread
From: Bill Davidsen @ 2006-01-08 18:18 UTC (permalink / raw)
  To: Gordon Henderson; +Cc: Sebastian Kuzminsky, linux-raid

Gordon Henderson wrote:

>On Wed, 21 Dec 2005, Sebastian Kuzminsky wrote:
>
>  
>
>>>But how does the performance for read and write compare?
>>>      
>>>
>>Good question!  I'll post some performance numbers of the RAID-6
>>configuration when I have it up and running.
>>    
>>
>
>Post your hardware config too if you don't mind. I have one server with 8
>drives and for swap (Which it never does!) I created 2 x 4 disk RAID 6
>arrays (same partition on all disks) and gave them to the kernel with
>equal priority
>
>Filename                                Type            Size    Used    Priority
>/dev/md10                               partition       1991800 0       1
>/dev/md11                               partition       1991800 0       1
>
>md10 : active raid6 sdd2[3] sdc2[2] sdb2[1] sda2[0]
>      1991808 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
>
>md11 : active raid6 sdh2[3] sdg2[2] sdf2[1] sde2[0]
>      1991808 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
>
>/dev/md10:
> Timing buffered disk reads:  64 MB in  0.66 seconds = 97.28 MB/sec
>
>/dev/md11:
> Timing buffered disk reads:  64 MB in  0.95 seconds = 67.59 MB/sec
>
>md10 is an on-board 4-port SII SATA controller, md11 is 2 x 2-port SII
>PCI cards. (Server is currently moderately loaded, so results are a bit
>lower than usual
>
>Cue the must/must not swap on RAID arguments ;-)
>

I wouldn't swap on RAID-6... performance is importand, swap is tiny 
compared to disk size. I would go to 2GB partitions and four-way RAID-1, 
since fast swap in seems to make for better "feel" and write is cached 
somewhat.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-01-08 18:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-20 17:18 Raid sync observations Sebastian Kuzminsky
2005-12-20 18:14 ` Sebastian Kuzminsky
2005-12-20 21:27 ` Neil Brown
2005-12-20 21:33   ` Sebastian Kuzminsky
2005-12-21  1:55 ` Christopher Smith
2005-12-21 11:49   ` Andy Smith
2005-12-21 17:02     ` Sebastian Kuzminsky
2005-12-21 17:13       ` Gordon Henderson
2006-01-08 18:18         ` Bill Davidsen
  -- strict thread matches above, loose matches on Subject: below --
2005-12-20 18:27 Andrew Burgess
2005-12-20 19:13 ` Sebastian Kuzminsky
2005-12-20 22:09 Jeff Breidenbach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).