* Slowww raid check (raid10, f2)
@ 2008-06-26 13:21 Jon Nelson
2008-06-26 14:07 ` Keld Jørn Simonsen
2008-06-26 14:24 ` Roger Heflin
0 siblings, 2 replies; 8+ messages in thread
From: Jon Nelson @ 2008-06-26 13:21 UTC (permalink / raw)
To: Linux-Raid
A few months back, I converted my raid setup from raid5 to raid10,f2,
using the same disks and setup as before.
The setup is an AMD x86-64, 3600+ dual, making use of three 300 GB SATA disks:
The current raid looks like this:
md0 : active raid10 sdb4[0] sdc4[2] sdd4[1]
460057152 blocks 64K chunks 2 far-copies [3/3] [UUU]
bitmap: 1/439 pages [4KB], 512KB chunk, file: /md0.bitmap
/dev/md0:
Version : 00.90.03
Creation Time : Fri May 23 23:24:20 2008
Raid Level : raid10
Array Size : 460057152 (438.74 GiB 471.10 GB)
Used Dev Size : 306704768 (292.50 GiB 314.07 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Intent Bitmap : /md0.bitmap
Update Time : Thu Jun 26 08:16:52 2008
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : near=1, far=2
Chunk Size : 64K
UUID : ff4e969d:2f07be4e:8c61e068:8406cdc0
Events : 0.1670
Number Major Minor RaidDevice State
0 8 20 0 active sync /dev/sdb4
1 8 52 1 active sync /dev/sdd4
2 8 36 2 active sync /dev/sdc4
As you can see, it's comprised of 3x 292 MiB partitions (the other
partitions are unused or used for /boot, so no run-time I/O).
Individually, the disks are capable of some 70 MB/s (give or take).
The raid5 would take 2.5 hours to run a "check".
The raid10,f2 takes substantially longer:
Jun 23 02:30:01 turnip kernel: md: data-check of RAID array md0
Jun 23 07:17:46 turnip kernel: md: md0: data-check done.
Whaaa? 4.75 hours? That's 28MB/s end-to-end. That's about 40% of
actual disk speed. I expected it to be slower but not /that/ much
slower. What might be going on here?
--
Jon
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Slowww raid check (raid10, f2)
2008-06-26 13:21 Slowww raid check (raid10, f2) Jon Nelson
@ 2008-06-26 14:07 ` Keld Jørn Simonsen
2008-06-26 20:03 ` Jon Nelson
2008-06-26 14:24 ` Roger Heflin
1 sibling, 1 reply; 8+ messages in thread
From: Keld Jørn Simonsen @ 2008-06-26 14:07 UTC (permalink / raw)
To: Jon Nelson; +Cc: Linux-Raid
On Thu, Jun 26, 2008 at 08:21:49AM -0500, Jon Nelson wrote:
> A few months back, I converted my raid setup from raid5 to raid10,f2,
> using the same disks and setup as before.
> The setup is an AMD x86-64, 3600+ dual, making use of three 300 GB SATA disks:
>
> The current raid looks like this:
>
> md0 : active raid10 sdb4[0] sdc4[2] sdd4[1]
> 460057152 blocks 64K chunks 2 far-copies [3/3] [UUU]
> bitmap: 1/439 pages [4KB], 512KB chunk, file: /md0.bitmap
>
> /dev/md0:
> Version : 00.90.03
> Creation Time : Fri May 23 23:24:20 2008
> Raid Level : raid10
> Array Size : 460057152 (438.74 GiB 471.10 GB)
> Used Dev Size : 306704768 (292.50 GiB 314.07 GB)
> Raid Devices : 3
> Total Devices : 3
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Intent Bitmap : /md0.bitmap
>
> Update Time : Thu Jun 26 08:16:52 2008
> State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : near=1, far=2
> Chunk Size : 64K
>
> UUID : ff4e969d:2f07be4e:8c61e068:8406cdc0
> Events : 0.1670
>
> Number Major Minor RaidDevice State
> 0 8 20 0 active sync /dev/sdb4
> 1 8 52 1 active sync /dev/sdd4
> 2 8 36 2 active sync /dev/sdc4
>
> As you can see, it's comprised of 3x 292 MiB partitions (the other
> partitions are unused or used for /boot, so no run-time I/O).
>
> Individually, the disks are capable of some 70 MB/s (give or take).
> The raid5 would take 2.5 hours to run a "check".
> The raid10,f2 takes substantially longer:
>
> Jun 23 02:30:01 turnip kernel: md: data-check of RAID array md0
> Jun 23 07:17:46 turnip kernel: md: md0: data-check done.
>
> Whaaa? 4.75 hours? That's 28MB/s end-to-end. That's about 40% of
> actual disk speed. I expected it to be slower but not /that/ much
> slower. What might be going on here?
It could be random IO, sort of. I am not sure how the checking is done,
but if it does it in sequential block order there will be a lot of
head moving because of the striping layout of raid10,f2.
This could be improved if the checking could take one stripe layer at a
time. Maybe that is not possible if what is checked is that contents of
one part of the mirror is equal to the other. Another strategy could
then be to check large chunks of data a time, say 20 MB - then quite some
stripe reading should be achieved.
best regards
keld
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Slowww raid check (raid10, f2)
2008-06-26 13:21 Slowww raid check (raid10, f2) Jon Nelson
2008-06-26 14:07 ` Keld Jørn Simonsen
@ 2008-06-26 14:24 ` Roger Heflin
2008-06-26 20:03 ` Jon Nelson
1 sibling, 1 reply; 8+ messages in thread
From: Roger Heflin @ 2008-06-26 14:24 UTC (permalink / raw)
To: Jon Nelson; +Cc: Linux-Raid
Jon Nelson wrote:
> A few months back, I converted my raid setup from raid5 to raid10,f2,
> using the same disks and setup as before.
> The setup is an AMD x86-64, 3600+ dual, making use of three 300 GB SATA disks:
>
> The current raid looks like this:
>
> md0 : active raid10 sdb4[0] sdc4[2] sdd4[1]
> 460057152 blocks 64K chunks 2 far-copies [3/3] [UUU]
> bitmap: 1/439 pages [4KB], 512KB chunk, file: /md0.bitmap
>
> /dev/md0:
> Version : 00.90.03
> Creation Time : Fri May 23 23:24:20 2008
> Raid Level : raid10
> Array Size : 460057152 (438.74 GiB 471.10 GB)
> Used Dev Size : 306704768 (292.50 GiB 314.07 GB)
> Raid Devices : 3
> Total Devices : 3
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Intent Bitmap : /md0.bitmap
>
> Update Time : Thu Jun 26 08:16:52 2008
> State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : near=1, far=2
> Chunk Size : 64K
>
> UUID : ff4e969d:2f07be4e:8c61e068:8406cdc0
> Events : 0.1670
>
> Number Major Minor RaidDevice State
> 0 8 20 0 active sync /dev/sdb4
> 1 8 52 1 active sync /dev/sdd4
> 2 8 36 2 active sync /dev/sdc4
>
> As you can see, it's comprised of 3x 292 MiB partitions (the other
> partitions are unused or used for /boot, so no run-time I/O).
>
> Individually, the disks are capable of some 70 MB/s (give or take).
> The raid5 would take 2.5 hours to run a "check".
> The raid10,f2 takes substantially longer:
>
> Jun 23 02:30:01 turnip kernel: md: data-check of RAID array md0
> Jun 23 07:17:46 turnip kernel: md: md0: data-check done.
>
> Whaaa? 4.75 hours? That's 28MB/s end-to-end. That's about 40% of
> actual disk speed. I expected it to be slower but not /that/ much
> slower. What might be going on here?
>
What kind of controller are you using, and how is it connected to the MB?
If it is a PCI (non-e, non-X) those numbers are about right.
If it is on the MB but still wired in with a PCI 32-bit/33mhz slot that is also
about right.
If it is either PCI-X, PCI-e, or wired into the MB with a proper connection then
this would be low.
The ones on the MB can be connected almost any way, I have seen nice fast
connections and I have seen ones connected with standard PCI on the MB.
Do a test of "dd if=/dev/sdb4 of=/dev/null bs=64k" on 1 then 2 and the 3 disks
while watching "vmstat 1" and see how it scales.
Roger
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Slowww raid check (raid10, f2)
2008-06-26 14:24 ` Roger Heflin
@ 2008-06-26 20:03 ` Jon Nelson
2008-06-26 20:13 ` Roger Heflin
0 siblings, 1 reply; 8+ messages in thread
From: Jon Nelson @ 2008-06-26 20:03 UTC (permalink / raw)
To: Roger Heflin, Linux-Raid
On Thu, Jun 26, 2008 at 9:24 AM, Roger Heflin <rogerheflin@gmail.com> wrote:
> Jon Nelson wrote:
> What kind of controller are you using, and how is it connected to the MB?
> If it is either PCI-X, PCI-e, or wired into the MB with a proper connection
> then this would be low.
MCP55, built-in.
cat /proc/interrupts:
CPU0 CPU1
0: 67908 136036611 IO-APIC-edge timer
1: 0 10 IO-APIC-edge i8042
2: 0 0 XT-PIC-XT cascade
5: 8325169 15373702 IO-APIC-fasteoi sata_nv, ehci_hcd:usb1
7: 0 0 IO-APIC-fasteoi ohci_hcd:usb2
8: 0 0 IO-APIC-edge rtc
9: 0 0 IO-APIC-edge acpi
10: 3722699 7890387 IO-APIC-fasteoi sata_nv
11: 0 0 IO-APIC-fasteoi sata_nv
14: 1339948 1448257 IO-APIC-edge libata
15: 0 0 IO-APIC-edge libata
4345: 62529065 1494 PCI-MSI-edge eth1
4346: 8 60190576 PCI-MSI-edge eth0
NMI: 0 0
LOC: 136110735 136110816
ERR: 0
> Do a test of "dd if=/dev/sdb4 of=/dev/null bs=64k" on 1 then 2 and the 3
> disks while watching "vmstat 1" and see how it scales.
Start with 1, then 2, then 3. Then back to 2, then back to 1. Then done.
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 392 9760 704136 17656 0 0 67968 16 1985 2578 0 24 48 29
1 1 392 9384 704632 17636 0 0 74900 0 1704 2540 0 26 45 29
2 1 392 9992 703148 18036 0 0 74516 0 1750 2581 0 25 46 29
2 0 392 9156 704096 18100 0 0 153856 0 4193 8686 0 55 25 20
2 1 392 9240 704328 17892 0 0 147606 32 3990 8608 0 58 20 23
3 0 392 9136 704444 17704 0 0 143434 52 3596 8087 0 52 17 30
1 2 392 9492 703880 18068 0 0 136604 12 3438 7205 0 50 23 26
1 2 392 9552 704272 17588 0 0 153984 0 3837 8461 0 57 21 21
1 1 392 9812 704160 17368 0 0 149399 0 3760 8121 0 54 20 26
2 1 392 9296 704464 17376 0 0 133546 32 3377 7822 0 52 18 30
3 1 392 9240 704040 17796 0 0 152696 16 3811 7704 0 57 16 28
3 3 392 10020 703296 17428 0 0 196994 36 5028 6354 0 75 1 23
3 0 392 9152 704172 17332 0 0 197809 28 5030 5603 0 74 0 25
2 2 392 9232 704440 17324 0 0 203131 0 5141 6030 0 75 0 24
3 2 392 9680 704112 16988 0 0 201973 0 5105 5601 1 78 0 22
2 1 400 10216 703656 17032 0 8 189088 52 4634 5853 0 69 0 31
3 1 400 9112 704664 17004 0 0 188936 44 4721 5495 0 70 2 28
1 4 400 10080 704132 17008 0 0 200736 4 5000 6037 0 78 1 21
3 2 400 9212 705012 16800 0 0 146072 40 3724 6490 0 54 16 30
1 1 400 9724 705988 17328 0 0 108857 32 2707 6034 0 39 9 51
1 1 400 9164 706800 17436 0 0 144175 0 3580 8223 0 52 21 26
1 2 400 10044 707708 17500 0 0 73452 0 1662 2560 0 26 46 27
--
Jon
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Slowww raid check (raid10, f2)
2008-06-26 14:07 ` Keld Jørn Simonsen
@ 2008-06-26 20:03 ` Jon Nelson
0 siblings, 0 replies; 8+ messages in thread
From: Jon Nelson @ 2008-06-26 20:03 UTC (permalink / raw)
To: Keld Jørn Simonsen; +Cc: Linux-Raid
On Thu, Jun 26, 2008 at 9:07 AM, Keld Jørn Simonsen <keld@dkuug.dk> wrote:
> It could be random IO, sort of. I am not sure how the checking is done,
> but if it does it in sequential block order there will be a lot of
> head moving because of the striping layout of raid10,f2.
That's exactly what I was thinking. When I get some time, perhaps I'll
dig into the source.
--
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Slowww raid check (raid10, f2)
2008-06-26 20:03 ` Jon Nelson
@ 2008-06-26 20:13 ` Roger Heflin
2008-06-26 20:22 ` Jon Nelson
0 siblings, 1 reply; 8+ messages in thread
From: Roger Heflin @ 2008-06-26 20:13 UTC (permalink / raw)
To: Jon Nelson; +Cc: Linux-Raid
Jon Nelson wrote:
> MCP55, built-in.
> cat /proc/interrupts:
>
> CPU0 CPU1
> 0: 67908 136036611 IO-APIC-edge timer
> 1: 0 10 IO-APIC-edge i8042
> 2: 0 0 XT-PIC-XT cascade
> 5: 8325169 15373702 IO-APIC-fasteoi sata_nv, ehci_hcd:usb1
> 7: 0 0 IO-APIC-fasteoi ohci_hcd:usb2
> 8: 0 0 IO-APIC-edge rtc
> 9: 0 0 IO-APIC-edge acpi
> 10: 3722699 7890387 IO-APIC-fasteoi sata_nv
> 11: 0 0 IO-APIC-fasteoi sata_nv
> 14: 1339948 1448257 IO-APIC-edge libata
> 15: 0 0 IO-APIC-edge libata
> 4345: 62529065 1494 PCI-MSI-edge eth1
> 4346: 8 60190576 PCI-MSI-edge eth0
> NMI: 0 0
> LOC: 136110735 136110816
> ERR: 0
>
>> Do a test of "dd if=/dev/sdb4 of=/dev/null bs=64k" on 1 then 2 and the 3
>> disks while watching "vmstat 1" and see how it scales.
>
> Start with 1, then 2, then 3. Then back to 2, then back to 1. Then done.
>
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 1 1 392 9760 704136 17656 0 0 67968 16 1985 2578 0 24 48 29
> 1 1 392 9384 704632 17636 0 0 74900 0 1704 2540 0 26 45 29
> 2 1 392 9992 703148 18036 0 0 74516 0 1750 2581 0 25 46 29
> 2 0 392 9156 704096 18100 0 0 153856 0 4193 8686 0 55 25 20
> 2 1 392 9240 704328 17892 0 0 147606 32 3990 8608 0 58 20 23
> 3 0 392 9136 704444 17704 0 0 143434 52 3596 8087 0 52 17 30
> 1 2 392 9492 703880 18068 0 0 136604 12 3438 7205 0 50 23 26
> 1 2 392 9552 704272 17588 0 0 153984 0 3837 8461 0 57 21 21
> 1 1 392 9812 704160 17368 0 0 149399 0 3760 8121 0 54 20 26
> 2 1 392 9296 704464 17376 0 0 133546 32 3377 7822 0 52 18 30
> 3 1 392 9240 704040 17796 0 0 152696 16 3811 7704 0 57 16 28
> 3 3 392 10020 703296 17428 0 0 196994 36 5028 6354 0 75 1 23
> 3 0 392 9152 704172 17332 0 0 197809 28 5030 5603 0 74 0 25
> 2 2 392 9232 704440 17324 0 0 203131 0 5141 6030 0 75 0 24
> 3 2 392 9680 704112 16988 0 0 201973 0 5105 5601 1 78 0 22
> 2 1 400 10216 703656 17032 0 8 189088 52 4634 5853 0 69 0 31
> 3 1 400 9112 704664 17004 0 0 188936 44 4721 5495 0 70 2 28
> 1 4 400 10080 704132 17008 0 0 200736 4 5000 6037 0 78 1 21
> 3 2 400 9212 705012 16800 0 0 146072 40 3724 6490 0 54 16 30
> 1 1 400 9724 705988 17328 0 0 108857 32 2707 6034 0 39 9 51
> 1 1 400 9164 706800 17436 0 0 144175 0 3580 8223 0 52 21 26
> 1 2 400 10044 707708 17500 0 0 73452 0 1662 2560 0 26 46 27
>
>
That is a good built-in controller then, the scaling is almost perfect,
predicted would be 74, 158, 222 vs. 74, 154, 205.
Roger
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Slowww raid check (raid10, f2)
2008-06-26 20:13 ` Roger Heflin
@ 2008-06-26 20:22 ` Jon Nelson
2008-06-26 20:47 ` Roger Heflin
0 siblings, 1 reply; 8+ messages in thread
From: Jon Nelson @ 2008-06-26 20:22 UTC (permalink / raw)
To: Roger Heflin; +Cc: Linux-Raid
> That is a good built-in controller then, the scaling is almost perfect,
> predicted would be 74, 158, 222 vs. 74, 154, 205.
I would really like to know how you arrived at what appear to be
fairly specific numbers.
--
Jon
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Slowww raid check (raid10, f2)
2008-06-26 20:22 ` Jon Nelson
@ 2008-06-26 20:47 ` Roger Heflin
0 siblings, 0 replies; 8+ messages in thread
From: Roger Heflin @ 2008-06-26 20:47 UTC (permalink / raw)
To: Jon Nelson; +Cc: Linux-Raid
Jon Nelson wrote:
>> That is a good built-in controller then, the scaling is almost perfect,
>> predicted would be 74, 158, 222 vs. 74, 154, 205.
>
> I would really like to know how you arrived at what appear to be
> fairly specific numbers.
>
>
From your test results each disk does about 74, both together should do 2x that
in a perfect world (no interference with each other), and 3 should do 3x that,
at 3x your system is 17MB/second slower than perfect, which is pretty good.
With a PCI controller (32bit/33mhz-standard desktop-max 133MB/second) the
numbers look more like this:
70, 100, 115 (pretty bad).
You could get better estimates of how much interference is going on by using
"vmstat 60" as that will give you a more accurate sustained number, and would
give you better ideas of what the disks can sustain over longer periods of time,
note though that the disks get slower the further you are into the disk, if you
start a dd and graph the vmstat output then the disk speed will slowly decrease
as you get to the inside of the disk, but your built-in controller is pretty good.
And the seeking around won't hurt too bad unless the block size is small, with a
8ms seek time you can write/read about 600kb of data in the time it takes for a
seek, so if you seek,read,seek,read with 600kb blocks you will get about 50% of
disk speed, but if you do the same with smaller blocks the seek time uses up
more time than the write/reads. If you use 1M blocks you are spending more
time doing writes/reads, if you use 256kb blocks more time is spent seeking than
write/read.
Roger
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-06-26 20:47 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-26 13:21 Slowww raid check (raid10, f2) Jon Nelson
2008-06-26 14:07 ` Keld Jørn Simonsen
2008-06-26 20:03 ` Jon Nelson
2008-06-26 14:24 ` Roger Heflin
2008-06-26 20:03 ` Jon Nelson
2008-06-26 20:13 ` Roger Heflin
2008-06-26 20:22 ` Jon Nelson
2008-06-26 20:47 ` Roger Heflin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).