Slowww raid check (raid10, f2)

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Slowww raid check (raid10, f2)
@ 2008-06-26 13:21 Jon Nelson
  2008-06-26 14:07 ` Keld Jørn Simonsen
  2008-06-26 14:24 ` Roger Heflin
  0 siblings, 2 replies; 8+ messages in thread
From: Jon Nelson @ 2008-06-26 13:21 UTC (permalink / raw)
  To: Linux-Raid

A few months back, I converted my raid setup from raid5 to raid10,f2,
using the same disks and setup as before.
The setup is an AMD x86-64, 3600+ dual, making use of three 300 GB SATA disks:

The current raid looks like this:

md0 : active raid10 sdb4[0] sdc4[2] sdd4[1]
      460057152 blocks 64K chunks 2 far-copies [3/3] [UUU]
      bitmap: 1/439 pages [4KB], 512KB chunk, file: /md0.bitmap

/dev/md0:
        Version : 00.90.03
  Creation Time : Fri May 23 23:24:20 2008
     Raid Level : raid10
     Array Size : 460057152 (438.74 GiB 471.10 GB)
  Used Dev Size : 306704768 (292.50 GiB 314.07 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

  Intent Bitmap : /md0.bitmap

    Update Time : Thu Jun 26 08:16:52 2008
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=1, far=2
     Chunk Size : 64K

           UUID : ff4e969d:2f07be4e:8c61e068:8406cdc0
         Events : 0.1670

    Number   Major   Minor   RaidDevice State
       0       8       20        0      active sync   /dev/sdb4
       1       8       52        1      active sync   /dev/sdd4
       2       8       36        2      active sync   /dev/sdc4

As you can see, it's comprised of 3x 292 MiB partitions (the other
partitions are unused or used for /boot, so no run-time I/O).

Individually, the disks are capable of some 70 MB/s (give or take).
The raid5 would take 2.5 hours to run a "check".
The raid10,f2 takes substantially longer:

Jun 23 02:30:01 turnip kernel: md: data-check of RAID array md0
Jun 23 07:17:46 turnip kernel: md: md0: data-check done.

Whaaa? 4.75 hours? That's 28MB/s end-to-end. That's about 40% of
actual disk speed. I expected it to be slower but not /that/ much
slower. What might be going on here?

-- 
Jon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Slowww raid check (raid10, f2)
  2008-06-26 13:21 Slowww raid check (raid10, f2) Jon Nelson
@ 2008-06-26 14:07 ` Keld Jørn Simonsen
  2008-06-26 20:03   ` Jon Nelson
  2008-06-26 14:24 ` Roger Heflin
  1 sibling, 1 reply; 8+ messages in thread
From: Keld Jørn Simonsen @ 2008-06-26 14:07 UTC (permalink / raw)
  To: Jon Nelson; +Cc: Linux-Raid

On Thu, Jun 26, 2008 at 08:21:49AM -0500, Jon Nelson wrote:
> A few months back, I converted my raid setup from raid5 to raid10,f2,
> using the same disks and setup as before.
> The setup is an AMD x86-64, 3600+ dual, making use of three 300 GB SATA disks:
> 
> The current raid looks like this:
> 
> md0 : active raid10 sdb4[0] sdc4[2] sdd4[1]
>       460057152 blocks 64K chunks 2 far-copies [3/3] [UUU]
>       bitmap: 1/439 pages [4KB], 512KB chunk, file: /md0.bitmap
> 
> /dev/md0:
>         Version : 00.90.03
>   Creation Time : Fri May 23 23:24:20 2008
>      Raid Level : raid10
>      Array Size : 460057152 (438.74 GiB 471.10 GB)
>   Used Dev Size : 306704768 (292.50 GiB 314.07 GB)
>    Raid Devices : 3
>   Total Devices : 3
> Preferred Minor : 0
>     Persistence : Superblock is persistent
> 
>   Intent Bitmap : /md0.bitmap
> 
>     Update Time : Thu Jun 26 08:16:52 2008
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : near=1, far=2
>      Chunk Size : 64K
> 
>            UUID : ff4e969d:2f07be4e:8c61e068:8406cdc0
>          Events : 0.1670
> 
>     Number   Major   Minor   RaidDevice State
>        0       8       20        0      active sync   /dev/sdb4
>        1       8       52        1      active sync   /dev/sdd4
>        2       8       36        2      active sync   /dev/sdc4
> 
> As you can see, it's comprised of 3x 292 MiB partitions (the other
> partitions are unused or used for /boot, so no run-time I/O).
> 
> Individually, the disks are capable of some 70 MB/s (give or take).
> The raid5 would take 2.5 hours to run a "check".
> The raid10,f2 takes substantially longer:
> 
> Jun 23 02:30:01 turnip kernel: md: data-check of RAID array md0
> Jun 23 07:17:46 turnip kernel: md: md0: data-check done.
> 
> Whaaa? 4.75 hours? That's 28MB/s end-to-end. That's about 40% of
> actual disk speed. I expected it to be slower but not /that/ much
> slower. What might be going on here?

It could be random IO, sort of. I am not sure how the checking is done,
but if it does it in sequential block order there will be a lot of 
head moving because of the striping layout of raid10,f2.

This could be improved if the checking could take one stripe layer at a
time. Maybe that is not possible if what is checked is that contents of
one part of the mirror is equal to the other. Another strategy could
then be to check large chunks of data a time, say 20 MB - then quite some
stripe reading should be achieved.

best regards
keld

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Slowww raid check (raid10, f2)
  2008-06-26 13:21 Slowww raid check (raid10, f2) Jon Nelson
  2008-06-26 14:07 ` Keld Jørn Simonsen
@ 2008-06-26 14:24 ` Roger Heflin
  2008-06-26 20:03   ` Jon Nelson
  1 sibling, 1 reply; 8+ messages in thread
From: Roger Heflin @ 2008-06-26 14:24 UTC (permalink / raw)
  To: Jon Nelson; +Cc: Linux-Raid

Jon Nelson wrote:
> A few months back, I converted my raid setup from raid5 to raid10,f2,
> using the same disks and setup as before.
> The setup is an AMD x86-64, 3600+ dual, making use of three 300 GB SATA disks:
> 
> The current raid looks like this:
> 
> md0 : active raid10 sdb4[0] sdc4[2] sdd4[1]
>       460057152 blocks 64K chunks 2 far-copies [3/3] [UUU]
>       bitmap: 1/439 pages [4KB], 512KB chunk, file: /md0.bitmap
> 
> /dev/md0:
>         Version : 00.90.03
>   Creation Time : Fri May 23 23:24:20 2008
>      Raid Level : raid10
>      Array Size : 460057152 (438.74 GiB 471.10 GB)
>   Used Dev Size : 306704768 (292.50 GiB 314.07 GB)
>    Raid Devices : 3
>   Total Devices : 3
> Preferred Minor : 0
>     Persistence : Superblock is persistent
> 
>   Intent Bitmap : /md0.bitmap
> 
>     Update Time : Thu Jun 26 08:16:52 2008
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : near=1, far=2
>      Chunk Size : 64K
> 
>            UUID : ff4e969d:2f07be4e:8c61e068:8406cdc0
>          Events : 0.1670
> 
>     Number   Major   Minor   RaidDevice State
>        0       8       20        0      active sync   /dev/sdb4
>        1       8       52        1      active sync   /dev/sdd4
>        2       8       36        2      active sync   /dev/sdc4
> 
> As you can see, it's comprised of 3x 292 MiB partitions (the other
> partitions are unused or used for /boot, so no run-time I/O).
> 
> Individually, the disks are capable of some 70 MB/s (give or take).
> The raid5 would take 2.5 hours to run a "check".
> The raid10,f2 takes substantially longer:
> 
> Jun 23 02:30:01 turnip kernel: md: data-check of RAID array md0
> Jun 23 07:17:46 turnip kernel: md: md0: data-check done.
> 
> Whaaa? 4.75 hours? That's 28MB/s end-to-end. That's about 40% of
> actual disk speed. I expected it to be slower but not /that/ much
> slower. What might be going on here?
> 

What kind of controller are you using, and how is it connected to the MB?

If it is a PCI (non-e, non-X) those numbers are about right.

If it is on the MB but still wired in with a PCI 32-bit/33mhz slot that is also 
about right.

If it is either PCI-X, PCI-e, or wired into the MB with a proper connection then 
this would be low.

The ones on the MB can be connected almost any way, I have seen nice fast 
connections and I have seen ones connected with standard PCI on the MB.

Do a test of "dd if=/dev/sdb4 of=/dev/null bs=64k" on 1 then 2 and the 3 disks 
while watching "vmstat 1" and see how it scales.

                            Roger

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Slowww raid check (raid10, f2)
  2008-06-26 14:24 ` Roger Heflin
@ 2008-06-26 20:03   ` Jon Nelson
  2008-06-26 20:13     ` Roger Heflin
  0 siblings, 1 reply; 8+ messages in thread
From: Jon Nelson @ 2008-06-26 20:03 UTC (permalink / raw)
  To: Roger Heflin, Linux-Raid

On Thu, Jun 26, 2008 at 9:24 AM, Roger Heflin <rogerheflin@gmail.com> wrote:
> Jon Nelson wrote:
> What kind of controller are you using, and how is it connected to the MB?

> If it is either PCI-X, PCI-e, or wired into the MB with a proper connection
> then this would be low.

MCP55, built-in.
cat /proc/interrupts:

           CPU0       CPU1
  0:      67908  136036611   IO-APIC-edge      timer
  1:          0         10   IO-APIC-edge      i8042
  2:          0          0    XT-PIC-XT        cascade
  5:    8325169   15373702   IO-APIC-fasteoi   sata_nv, ehci_hcd:usb1
  7:          0          0   IO-APIC-fasteoi   ohci_hcd:usb2
  8:          0          0   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-edge      acpi
 10:    3722699    7890387   IO-APIC-fasteoi   sata_nv
 11:          0          0   IO-APIC-fasteoi   sata_nv
 14:    1339948    1448257   IO-APIC-edge      libata
 15:          0          0   IO-APIC-edge      libata
4345:   62529065       1494   PCI-MSI-edge      eth1
4346:          8   60190576   PCI-MSI-edge      eth0
NMI:          0          0
LOC:  136110735  136110816
ERR:          0

> Do a test of "dd if=/dev/sdb4 of=/dev/null bs=64k" on 1 then 2 and the 3
> disks while watching "vmstat 1" and see how it scales.

Start with 1, then 2, then 3. Then back to 2, then back to 1. Then done.

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  1    392   9760 704136  17656    0    0 67968    16 1985 2578  0 24 48 29
 1  1    392   9384 704632  17636    0    0 74900     0 1704 2540  0 26 45 29
 2  1    392   9992 703148  18036    0    0 74516     0 1750 2581  0 25 46 29
 2  0    392   9156 704096  18100    0    0 153856     0 4193 8686  0 55 25 20
 2  1    392   9240 704328  17892    0    0 147606    32 3990 8608  0 58 20 23
 3  0    392   9136 704444  17704    0    0 143434    52 3596 8087  0 52 17 30
 1  2    392   9492 703880  18068    0    0 136604    12 3438 7205  0 50 23 26
 1  2    392   9552 704272  17588    0    0 153984     0 3837 8461  0 57 21 21
 1  1    392   9812 704160  17368    0    0 149399     0 3760 8121  0 54 20 26
 2  1    392   9296 704464  17376    0    0 133546    32 3377 7822  0 52 18 30
 3  1    392   9240 704040  17796    0    0 152696    16 3811 7704  0 57 16 28
 3  3    392  10020 703296  17428    0    0 196994    36 5028 6354  0 75  1 23
 3  0    392   9152 704172  17332    0    0 197809    28 5030 5603  0 74  0 25
 2  2    392   9232 704440  17324    0    0 203131     0 5141 6030  0 75  0 24
 3  2    392   9680 704112  16988    0    0 201973     0 5105 5601  1 78  0 22
 2  1    400  10216 703656  17032    0    8 189088    52 4634 5853  0 69  0 31
 3  1    400   9112 704664  17004    0    0 188936    44 4721 5495  0 70  2 28
 1  4    400  10080 704132  17008    0    0 200736     4 5000 6037  0 78  1 21
 3  2    400   9212 705012  16800    0    0 146072    40 3724 6490  0 54 16 30
 1  1    400   9724 705988  17328    0    0 108857    32 2707 6034  0 39  9 51
 1  1    400   9164 706800  17436    0    0 144175     0 3580 8223  0 52 21 26
 1  2    400  10044 707708  17500    0    0 73452     0 1662 2560  0 26 46 27


-- 
Jon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Slowww raid check (raid10, f2)
  2008-06-26 14:07 ` Keld Jørn Simonsen
@ 2008-06-26 20:03   ` Jon Nelson
  0 siblings, 0 replies; 8+ messages in thread
From: Jon Nelson @ 2008-06-26 20:03 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Linux-Raid

On Thu, Jun 26, 2008 at 9:07 AM, Keld Jørn Simonsen <keld@dkuug.dk> wrote:

> It could be random IO, sort of. I am not sure how the checking is done,
> but if it does it in sequential block order there will be a lot of
> head moving because of the striping layout of raid10,f2.

That's exactly what I was thinking. When I get some time, perhaps I'll
dig into the source.

-- 
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Slowww raid check (raid10, f2)
  2008-06-26 20:03   ` Jon Nelson
@ 2008-06-26 20:13     ` Roger Heflin
  2008-06-26 20:22       ` Jon Nelson
  0 siblings, 1 reply; 8+ messages in thread
From: Roger Heflin @ 2008-06-26 20:13 UTC (permalink / raw)
  To: Jon Nelson; +Cc: Linux-Raid

Jon Nelson wrote:

> MCP55, built-in.
> cat /proc/interrupts:
> 
>            CPU0       CPU1
>   0:      67908  136036611   IO-APIC-edge      timer
>   1:          0         10   IO-APIC-edge      i8042
>   2:          0          0    XT-PIC-XT        cascade
>   5:    8325169   15373702   IO-APIC-fasteoi   sata_nv, ehci_hcd:usb1
>   7:          0          0   IO-APIC-fasteoi   ohci_hcd:usb2
>   8:          0          0   IO-APIC-edge      rtc
>   9:          0          0   IO-APIC-edge      acpi
>  10:    3722699    7890387   IO-APIC-fasteoi   sata_nv
>  11:          0          0   IO-APIC-fasteoi   sata_nv
>  14:    1339948    1448257   IO-APIC-edge      libata
>  15:          0          0   IO-APIC-edge      libata
> 4345:   62529065       1494   PCI-MSI-edge      eth1
> 4346:          8   60190576   PCI-MSI-edge      eth0
> NMI:          0          0
> LOC:  136110735  136110816
> ERR:          0
> 
>> Do a test of "dd if=/dev/sdb4 of=/dev/null bs=64k" on 1 then 2 and the 3
>> disks while watching "vmstat 1" and see how it scales.
> 
> Start with 1, then 2, then 3. Then back to 2, then back to 1. Then done.
> 
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  1  1    392   9760 704136  17656    0    0 67968    16 1985 2578  0 24 48 29
>  1  1    392   9384 704632  17636    0    0 74900     0 1704 2540  0 26 45 29
>  2  1    392   9992 703148  18036    0    0 74516     0 1750 2581  0 25 46 29
>  2  0    392   9156 704096  18100    0    0 153856     0 4193 8686  0 55 25 20
>  2  1    392   9240 704328  17892    0    0 147606    32 3990 8608  0 58 20 23
>  3  0    392   9136 704444  17704    0    0 143434    52 3596 8087  0 52 17 30
>  1  2    392   9492 703880  18068    0    0 136604    12 3438 7205  0 50 23 26
>  1  2    392   9552 704272  17588    0    0 153984     0 3837 8461  0 57 21 21
>  1  1    392   9812 704160  17368    0    0 149399     0 3760 8121  0 54 20 26
>  2  1    392   9296 704464  17376    0    0 133546    32 3377 7822  0 52 18 30
>  3  1    392   9240 704040  17796    0    0 152696    16 3811 7704  0 57 16 28
>  3  3    392  10020 703296  17428    0    0 196994    36 5028 6354  0 75  1 23
>  3  0    392   9152 704172  17332    0    0 197809    28 5030 5603  0 74  0 25
>  2  2    392   9232 704440  17324    0    0 203131     0 5141 6030  0 75  0 24
>  3  2    392   9680 704112  16988    0    0 201973     0 5105 5601  1 78  0 22
>  2  1    400  10216 703656  17032    0    8 189088    52 4634 5853  0 69  0 31
>  3  1    400   9112 704664  17004    0    0 188936    44 4721 5495  0 70  2 28
>  1  4    400  10080 704132  17008    0    0 200736     4 5000 6037  0 78  1 21
>  3  2    400   9212 705012  16800    0    0 146072    40 3724 6490  0 54 16 30
>  1  1    400   9724 705988  17328    0    0 108857    32 2707 6034  0 39  9 51
>  1  1    400   9164 706800  17436    0    0 144175     0 3580 8223  0 52 21 26
>  1  2    400  10044 707708  17500    0    0 73452     0 1662 2560  0 26 46 27
> 
> 

That is a good built-in controller then, the scaling is almost perfect, 
predicted would be  74, 158, 222 vs. 74, 154, 205.

                             Roger

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Slowww raid check (raid10, f2)
  2008-06-26 20:13     ` Roger Heflin
@ 2008-06-26 20:22       ` Jon Nelson
  2008-06-26 20:47         ` Roger Heflin
  0 siblings, 1 reply; 8+ messages in thread
From: Jon Nelson @ 2008-06-26 20:22 UTC (permalink / raw)
  To: Roger Heflin; +Cc: Linux-Raid

> That is a good built-in controller then, the scaling is almost perfect,
> predicted would be  74, 158, 222 vs. 74, 154, 205.

I would really like to know how you arrived at what appear to be
fairly specific numbers.


-- 
Jon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Slowww raid check (raid10, f2)
  2008-06-26 20:22       ` Jon Nelson
@ 2008-06-26 20:47         ` Roger Heflin
  0 siblings, 0 replies; 8+ messages in thread
From: Roger Heflin @ 2008-06-26 20:47 UTC (permalink / raw)
  To: Jon Nelson; +Cc: Linux-Raid

Jon Nelson wrote:
>> That is a good built-in controller then, the scaling is almost perfect,
>> predicted would be  74, 158, 222 vs. 74, 154, 205.
> 
> I would really like to know how you arrived at what appear to be
> fairly specific numbers.
> 
> 

 From your test results each disk does about 74, both together should do 2x that 
in a perfect world (no interference with each other), and 3 should do 3x that, 
at 3x your system is 17MB/second slower than perfect, which is pretty good.

With a PCI controller (32bit/33mhz-standard desktop-max 133MB/second) the 
numbers look more like this:
70, 100, 115 (pretty bad).

You could get better estimates of how much interference is going on by using 
"vmstat 60" as that will give you a more accurate sustained number, and would 
give you better ideas of what the disks can sustain over longer periods of time, 
note though that the disks get slower the further you are into the disk, if you 
start a dd and graph the vmstat output then the disk speed will slowly decrease 
as you get to the inside of the disk, but your built-in controller is pretty good.

And the seeking around won't hurt too bad unless the block size is small, with a 
8ms seek time you can write/read about 600kb of data in the time it takes for a 
seek, so if you seek,read,seek,read with 600kb blocks you will get about 50% of 
disk speed, but if you do the same with smaller blocks the seek time uses up 
more time than the write/reads.    If you use 1M blocks you are spending more 
time doing writes/reads, if you use 256kb blocks more time is spent seeking than 
write/read.

                            Roger

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-06-26 20:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-26 13:21 Slowww raid check (raid10, f2) Jon Nelson
2008-06-26 14:07 ` Keld Jørn Simonsen
2008-06-26 20:03   ` Jon Nelson
2008-06-26 14:24 ` Roger Heflin
2008-06-26 20:03   ` Jon Nelson
2008-06-26 20:13     ` Roger Heflin
2008-06-26 20:22       ` Jon Nelson
2008-06-26 20:47         ` Roger Heflin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).