* RAID 5 performance issue.
@ 2007-10-03 9:53 Andrew Clayton
2007-10-03 16:43 ` Justin Piszcz
` (2 more replies)
0 siblings, 3 replies; 56+ messages in thread
From: Andrew Clayton @ 2007-10-03 9:53 UTC (permalink / raw)
To: linux-raid
Hi,
Hardware:
Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file system) is connected to the onboard Silicon Image 3114 controller. The other 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 card. I moved the 3 raid disks off the on board controller onto the card the other day to see if that would help, it didn't.
Software:
Fedora Core 6, 2.6.23-rc9 kernel.
Array/fs details:
Filesystems are XFS
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda2 xfs 20G 5.6G 14G 29% /
/dev/sda5 xfs 213G 3.6G 209G 2% /data
none tmpfs 1008M 0 1008M 0% /dev/shm
/dev/md0 xfs 466G 237G 229G 51% /home
/dev/md0 is currently mounted with the following options
noatime,logbufs=8,sunit=512,swidth=1024
sunit and swidth seem to be automatically set.
xfs_info shows
meta-data=/dev/md0 isize=256 agcount=16, agsize=7631168 blks
= sectsz=4096 attr=1
data = bsize=4096 blocks=122097920, imaxpct=25
= sunit=64 swidth=128 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=2
= sectsz=4096 sunit=1 blks, lazy-count=0
realtime =none extsz=524288 blocks=0, rtextents=0
The array has a 256k chunk size using left-symmetric layout.
/sys/block/md0/md/stripe_cache_size is currently at 4096 (upping this from
256, alleviates the problem at best)
I also have currently set /sys/block/sd[bcd]/queue/nr_requests to 512 (doesn't
seem to have made any difference)
Also blockdev --setra 8192 /dev/sd[bcd] also tried 16384 and 32768
IO scheduler is cfq for all devices.
This machine acts as a file server for about 11 workstations. /home (the software RAID 5) is exported over NFS where by the clients mount their home directories (using autofs).
I set it up about 3 years ago and it has been fine. However earlier this year we started noticing application stalls. e.g firefox would become unrepsonsive and the window would grey out (under Compiz), this typically lasts 2-4 seconds.
During these stalls, I see the below iostat activity (taken at 2 second intervals on the file server). High iowait, high await's. The stripe_cache_active max's out and things kind of grind to halt for a few seconds until the stripe_cache_active starts shrinking.
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 0.25 0.00 99.75
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 5.47 0.00 40.80 14.91 0.05 9.73 7.18 3.93
sdb 0.00 0.00 1.49 1.49 5.97 9.95 10.67 0.06 18.50 9.00 2.69
sdc 0.00 0.00 0.00 2.99 0.00 15.92 10.67 0.01 4.17 4.17 1.24
sdd 0.00 0.00 0.50 2.49 1.99 13.93 10.67 0.02 5.67 5.67 1.69
md0 0.00 0.00 0.00 1.99 0.00 7.96 8.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.25 0.00 5.24 1.50 0.00 93.02
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 12.50 0.00 85.75 13.72 0.12 9.60 6.28 7.85
sdb 182.50 275.00 114.00 17.50 986.00 82.00 16.24 337.03 660.64 6.06 79.70
sdc 171.00 269.50 117.00 20.00 1012.00 94.00 16.15 315.35 677.73 5.86 80.25
sdd 149.00 278.00 107.00 18.50 940.00 84.00 16.32 311.83 705.33 6.33 79.40
md0 0.00 0.00 0.00 1012.00 0.00 8090.00 15.99 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 1.50 44.61 0.00 53.88
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 1.00 0.00 4.25 8.50 0.00 0.00 0.00 0.00
sdb 168.50 64.00 129.50 58.00 1114.00 508.00 17.30 645.37 1272.90 5.34 100.05
sdc 194.00 76.50 141.50 43.00 1232.00 360.00 17.26 664.01 916.30 5.42 100.05
sdd 172.00 90.50 114.50 50.00 996.00 456.00 17.65 662.54 977.28 6.08 100.05
md0 0.00 0.00 0.50 8.00 2.00 32.00 8.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.25 0.00 1.50 48.50 0.00 49.75
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 1.50 0.00 2.50 3.33 0.00 0.33 0.33 0.05
sdb 0.00 142.50 63.50 115.50 558.00 1030.00 17.74 484.58 2229.89 5.59 100.10
sdc 0.00 113.00 63.00 114.50 534.00 994.00 17.22 507.33 2879.95 5.64 100.10
sdd 0.00 118.50 56.50 87.00 482.00 740.00 17.03 546.09 2650.33 6.98 100.10
md0 0.00 0.00 1.00 2.00 6.00 8.00 9.33 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 1.25 86.03 0.00 12.72
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.50 0.00 1.50 0.00 6.25 8.33 0.00 1.33 0.67 0.10
sdb 0.00 171.00 0.00 238.50 0.00 2164.00 18.15 320.17 3555.60 4.20 100.10
sdc 0.00 172.00 0.00 195.50 0.00 1776.00 18.17 372.72 3696.45 5.12 100.10
sdd 0.00 188.50 0.00 144.50 0.00 1318.00 18.24 528.15 3935.08 6.93 100.10
md0 0.00 0.00 0.00 1.50 0.00 6.00 8.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.75 73.50 0.00 25.75
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.50 67.16 1.49 181.59 7.96 1564.18 17.17 119.48 1818.11 4.61 84.48
sdc 0.50 70.65 1.99 177.11 9.95 1588.06 17.84 232.45 2844.31 5.56 99.60
sdd 0.00 77.11 1.49 149.75 5.97 1371.14 18.21 484.19 4728.82 6.59 99.60
md0 0.00 0.00 0.00 1.99 0.00 11.94 12.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
When stracing firefox (on the client), during its stall period I see multi-second stalls in the open,close and unlink system calls. e.g
open("/home/andrew/.mozilla/firefox/default.d9m/Cache/1A190CD5d01", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 39 <8.239256>
close(39) = 0 <1.125843>
When its behaving I get numbers more like:
open("/home/andrew/.mozilla/firefox/default.d9m/sessionstore-1.js",
O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0600) = 39 <0.008773>
close(39) = 0 <0.265877>
Not the same file but sessionstore-1.js is 56K and 1A190CD5d01 is 37K
vim also has noticeble stalls, probably when it is doing its swap file thing.
My music is stored on the server and it never seems to be affected (player
accessing the files straight over nfs).
I have put up the current kernel config at
http://digital-domain.net/kernel/sw-raid5-issue/config
and the output of mdadm -D /dev/md0 at
http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D
If anyone has any idea's I'm all ears.
Having just composed this message, I see this thread:
http://www.spinics.net/lists/raid/msg17190.html
I do remember seeing a lot of pdflush activity (using blktrace) around the
times of the stalls, but I don't seem to get the high cpu usage.
Cheers,
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 9:53 RAID 5 performance issue Andrew Clayton
@ 2007-10-03 16:43 ` Justin Piszcz
2007-10-03 16:48 ` Justin Piszcz
2007-10-03 20:19 ` Andrew Clayton
2007-10-03 17:53 ` Goswin von Brederlow
2007-10-05 20:25 ` Brendan Conoboy
2 siblings, 2 replies; 56+ messages in thread
From: Justin Piszcz @ 2007-10-03 16:43 UTC (permalink / raw)
To: Andrew Clayton; +Cc: linux-raid
Have you checked fragmentation?
xfs_db -c frag -f /dev/md3
What does this report?
Justin.
On Wed, 3 Oct 2007, Andrew Clayton wrote:
> Hi,
>
> Hardware:
>
> Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file system) is connected to the onboard Silicon Image 3114 controller. The other 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 card. I moved the 3 raid disks off the on board controller onto the card the other day to see if that would help, it didn't.
>
> Software:
>
> Fedora Core 6, 2.6.23-rc9 kernel.
>
> Array/fs details:
>
> Filesystems are XFS
>
> Filesystem Type Size Used Avail Use% Mounted on
> /dev/sda2 xfs 20G 5.6G 14G 29% /
> /dev/sda5 xfs 213G 3.6G 209G 2% /data
> none tmpfs 1008M 0 1008M 0% /dev/shm
> /dev/md0 xfs 466G 237G 229G 51% /home
>
> /dev/md0 is currently mounted with the following options
>
> noatime,logbufs=8,sunit=512,swidth=1024
>
> sunit and swidth seem to be automatically set.
>
> xfs_info shows
>
> meta-data=/dev/md0 isize=256 agcount=16, agsize=7631168 blks
> = sectsz=4096 attr=1
> data = bsize=4096 blocks=122097920, imaxpct=25
> = sunit=64 swidth=128 blks, unwritten=1
> naming =version 2 bsize=4096
> log =internal bsize=4096 blocks=32768, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=0
> realtime =none extsz=524288 blocks=0, rtextents=0
>
> The array has a 256k chunk size using left-symmetric layout.
>
> /sys/block/md0/md/stripe_cache_size is currently at 4096 (upping this from
> 256, alleviates the problem at best)
>
> I also have currently set /sys/block/sd[bcd]/queue/nr_requests to 512 (doesn't
> seem to have made any difference)
>
> Also blockdev --setra 8192 /dev/sd[bcd] also tried 16384 and 32768
>
> IO scheduler is cfq for all devices.
>
>
> This machine acts as a file server for about 11 workstations. /home (the software RAID 5) is exported over NFS where by the clients mount their home directories (using autofs).
>
> I set it up about 3 years ago and it has been fine. However earlier this year we started noticing application stalls. e.g firefox would become unrepsonsive and the window would grey out (under Compiz), this typically lasts 2-4 seconds.
>
> During these stalls, I see the below iostat activity (taken at 2 second intervals on the file server). High iowait, high await's. The stripe_cache_active max's out and things kind of grind to halt for a few seconds until the stripe_cache_active starts shrinking.
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.00 0.00 0.00 0.25 0.00 99.75
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
> sda 0.00 0.00 0.00 5.47 0.00 40.80 14.91 0.05 9.73 7.18 3.93
> sdb 0.00 0.00 1.49 1.49 5.97 9.95 10.67 0.06 18.50 9.00 2.69
> sdc 0.00 0.00 0.00 2.99 0.00 15.92 10.67 0.01 4.17 4.17 1.24
> sdd 0.00 0.00 0.50 2.49 1.99 13.93 10.67 0.02 5.67 5.67 1.69
> md0 0.00 0.00 0.00 1.99 0.00 7.96 8.00 0.00 0.00 0.00 0.00
> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.25 0.00 5.24 1.50 0.00 93.02
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
> sda 0.00 0.00 0.00 12.50 0.00 85.75 13.72 0.12 9.60 6.28 7.85
> sdb 182.50 275.00 114.00 17.50 986.00 82.00 16.24 337.03 660.64 6.06 79.70
> sdc 171.00 269.50 117.00 20.00 1012.00 94.00 16.15 315.35 677.73 5.86 80.25
> sdd 149.00 278.00 107.00 18.50 940.00 84.00 16.32 311.83 705.33 6.33 79.40
> md0 0.00 0.00 0.00 1012.00 0.00 8090.00 15.99 0.00 0.00 0.00 0.00
> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.00 0.00 1.50 44.61 0.00 53.88
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
> sda 0.00 0.00 0.00 1.00 0.00 4.25 8.50 0.00 0.00 0.00 0.00
> sdb 168.50 64.00 129.50 58.00 1114.00 508.00 17.30 645.37 1272.90 5.34 100.05
> sdc 194.00 76.50 141.50 43.00 1232.00 360.00 17.26 664.01 916.30 5.42 100.05
> sdd 172.00 90.50 114.50 50.00 996.00 456.00 17.65 662.54 977.28 6.08 100.05
> md0 0.00 0.00 0.50 8.00 2.00 32.00 8.00 0.00 0.00 0.00 0.00
> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.25 0.00 1.50 48.50 0.00 49.75
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
> sda 0.00 0.00 0.00 1.50 0.00 2.50 3.33 0.00 0.33 0.33 0.05
> sdb 0.00 142.50 63.50 115.50 558.00 1030.00 17.74 484.58 2229.89 5.59 100.10
> sdc 0.00 113.00 63.00 114.50 534.00 994.00 17.22 507.33 2879.95 5.64 100.10
> sdd 0.00 118.50 56.50 87.00 482.00 740.00 17.03 546.09 2650.33 6.98 100.10
> md0 0.00 0.00 1.00 2.00 6.00 8.00 9.33 0.00 0.00 0.00 0.00
> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.00 0.00 1.25 86.03 0.00 12.72
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
> sda 0.00 0.50 0.00 1.50 0.00 6.25 8.33 0.00 1.33 0.67 0.10
> sdb 0.00 171.00 0.00 238.50 0.00 2164.00 18.15 320.17 3555.60 4.20 100.10
> sdc 0.00 172.00 0.00 195.50 0.00 1776.00 18.17 372.72 3696.45 5.12 100.10
> sdd 0.00 188.50 0.00 144.50 0.00 1318.00 18.24 528.15 3935.08 6.93 100.10
> md0 0.00 0.00 0.00 1.50 0.00 6.00 8.00 0.00 0.00 0.00 0.00
> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.00 0.00 0.75 73.50 0.00 25.75
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
> sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdb 0.50 67.16 1.49 181.59 7.96 1564.18 17.17 119.48 1818.11 4.61 84.48
> sdc 0.50 70.65 1.99 177.11 9.95 1588.06 17.84 232.45 2844.31 5.56 99.60
> sdd 0.00 77.11 1.49 149.75 5.97 1371.14 18.21 484.19 4728.82 6.59 99.60
> md0 0.00 0.00 0.00 1.99 0.00 11.94 12.00 0.00 0.00 0.00 0.00
> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>
>
> When stracing firefox (on the client), during its stall period I see multi-second stalls in the open,close and unlink system calls. e.g
>
> open("/home/andrew/.mozilla/firefox/default.d9m/Cache/1A190CD5d01", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 39 <8.239256>
> close(39) = 0 <1.125843>
>
> When its behaving I get numbers more like:
>
> open("/home/andrew/.mozilla/firefox/default.d9m/sessionstore-1.js",
> O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0600) = 39 <0.008773>
> close(39) = 0 <0.265877>
>
> Not the same file but sessionstore-1.js is 56K and 1A190CD5d01 is 37K
>
> vim also has noticeble stalls, probably when it is doing its swap file thing.
>
> My music is stored on the server and it never seems to be affected (player
> accessing the files straight over nfs).
>
>
> I have put up the current kernel config at
>
> http://digital-domain.net/kernel/sw-raid5-issue/config
>
> and the output of mdadm -D /dev/md0 at
>
> http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D
>
>
> If anyone has any idea's I'm all ears.
>
>
> Having just composed this message, I see this thread:
> http://www.spinics.net/lists/raid/msg17190.html
>
> I do remember seeing a lot of pdflush activity (using blktrace) around the
> times of the stalls, but I don't seem to get the high cpu usage.
>
>
> Cheers,
>
> Andrew
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 16:43 ` Justin Piszcz
@ 2007-10-03 16:48 ` Justin Piszcz
2007-10-03 20:10 ` Andrew Clayton
2007-10-03 20:19 ` Andrew Clayton
1 sibling, 1 reply; 56+ messages in thread
From: Justin Piszcz @ 2007-10-03 16:48 UTC (permalink / raw)
To: Andrew Clayton; +Cc: linux-raid
Also if it is software raid, when you make the XFS filesyste, on it, it
sets up a proper (and tuned) sunit/swidth, so why would you want to change
that?
Justin.
On Wed, 3 Oct 2007, Justin Piszcz wrote:
> Have you checked fragmentation?
>
> xfs_db -c frag -f /dev/md3
>
> What does this report?
>
> Justin.
>
> On Wed, 3 Oct 2007, Andrew Clayton wrote:
>
>> Hi,
>>
>> Hardware:
>>
>> Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file
>> system) is connected to the onboard Silicon Image 3114 controller. The
>> other 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image
>> 3124 card. I moved the 3 raid disks off the on board controller onto the
>> card the other day to see if that would help, it didn't.
>>
>> Software:
>>
>> Fedora Core 6, 2.6.23-rc9 kernel.
>>
>> Array/fs details:
>>
>> Filesystems are XFS
>>
>> Filesystem Type Size Used Avail Use% Mounted on
>> /dev/sda2 xfs 20G 5.6G 14G 29% /
>> /dev/sda5 xfs 213G 3.6G 209G 2% /data
>> none tmpfs 1008M 0 1008M 0% /dev/shm
>> /dev/md0 xfs 466G 237G 229G 51% /home
>>
>> /dev/md0 is currently mounted with the following options
>>
>> noatime,logbufs=8,sunit=512,swidth=1024
>>
>> sunit and swidth seem to be automatically set.
>>
>> xfs_info shows
>>
>> meta-data=/dev/md0 isize=256 agcount=16, agsize=7631168
>> blks
>> = sectsz=4096 attr=1
>> data = bsize=4096 blocks=122097920, imaxpct=25
>> = sunit=64 swidth=128 blks, unwritten=1
>> naming =version 2 bsize=4096
>> log =internal bsize=4096 blocks=32768, version=2
>> = sectsz=4096 sunit=1 blks, lazy-count=0
>> realtime =none extsz=524288 blocks=0, rtextents=0
>>
>> The array has a 256k chunk size using left-symmetric layout.
>>
>> /sys/block/md0/md/stripe_cache_size is currently at 4096 (upping this from
>> 256, alleviates the problem at best)
>>
>> I also have currently set /sys/block/sd[bcd]/queue/nr_requests to 512
>> (doesn't
>> seem to have made any difference)
>>
>> Also blockdev --setra 8192 /dev/sd[bcd] also tried 16384 and 32768
>>
>> IO scheduler is cfq for all devices.
>>
>>
>> This machine acts as a file server for about 11 workstations. /home (the
>> software RAID 5) is exported over NFS where by the clients mount their home
>> directories (using autofs).
>>
>> I set it up about 3 years ago and it has been fine. However earlier this
>> year we started noticing application stalls. e.g firefox would become
>> unrepsonsive and the window would grey out (under Compiz), this typically
>> lasts 2-4 seconds.
>>
>> During these stalls, I see the below iostat activity (taken at 2 second
>> intervals on the file server). High iowait, high await's. The
>> stripe_cache_active max's out and things kind of grind to halt for a few
>> seconds until the stripe_cache_active starts shrinking.
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 0.00 0.00 0.00 0.25 0.00 99.75
>>
>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
>> avgqu-sz await svctm %util
>> sda 0.00 0.00 0.00 5.47 0.00 40.80 14.91
>> 0.05 9.73 7.18 3.93
>> sdb 0.00 0.00 1.49 1.49 5.97 9.95 10.67
>> 0.06 18.50 9.00 2.69
>> sdc 0.00 0.00 0.00 2.99 0.00 15.92 10.67
>> 0.01 4.17 4.17 1.24
>> sdd 0.00 0.00 0.50 2.49 1.99 13.93 10.67
>> 0.02 5.67 5.67 1.69
>> md0 0.00 0.00 0.00 1.99 0.00 7.96 8.00
>> 0.00 0.00 0.00 0.00
>> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>> 0.00 0.00 0.00 0.00
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 0.25 0.00 5.24 1.50 0.00 93.02
>>
>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
>> avgqu-sz await svctm %util
>> sda 0.00 0.00 0.00 12.50 0.00 85.75 13.72
>> 0.12 9.60 6.28 7.85
>> sdb 182.50 275.00 114.00 17.50 986.00 82.00 16.24
>> 337.03 660.64 6.06 79.70
>> sdc 171.00 269.50 117.00 20.00 1012.00 94.00 16.15
>> 315.35 677.73 5.86 80.25
>> sdd 149.00 278.00 107.00 18.50 940.00 84.00 16.32
>> 311.83 705.33 6.33 79.40
>> md0 0.00 0.00 0.00 1012.00 0.00 8090.00 15.99
>> 0.00 0.00 0.00 0.00
>> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>> 0.00 0.00 0.00 0.00
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 0.00 0.00 1.50 44.61 0.00 53.88
>>
>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
>> avgqu-sz await svctm %util
>> sda 0.00 0.00 0.00 1.00 0.00 4.25 8.50
>> 0.00 0.00 0.00 0.00
>> sdb 168.50 64.00 129.50 58.00 1114.00 508.00 17.30
>> 645.37 1272.90 5.34 100.05
>> sdc 194.00 76.50 141.50 43.00 1232.00 360.00 17.26
>> 664.01 916.30 5.42 100.05
>> sdd 172.00 90.50 114.50 50.00 996.00 456.00 17.65
>> 662.54 977.28 6.08 100.05
>> md0 0.00 0.00 0.50 8.00 2.00 32.00 8.00
>> 0.00 0.00 0.00 0.00
>> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>> 0.00 0.00 0.00 0.00
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 0.25 0.00 1.50 48.50 0.00 49.75
>>
>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
>> avgqu-sz await svctm %util
>> sda 0.00 0.00 0.00 1.50 0.00 2.50 3.33
>> 0.00 0.33 0.33 0.05
>> sdb 0.00 142.50 63.50 115.50 558.00 1030.00 17.74
>> 484.58 2229.89 5.59 100.10
>> sdc 0.00 113.00 63.00 114.50 534.00 994.00 17.22
>> 507.33 2879.95 5.64 100.10
>> sdd 0.00 118.50 56.50 87.00 482.00 740.00 17.03
>> 546.09 2650.33 6.98 100.10
>> md0 0.00 0.00 1.00 2.00 6.00 8.00 9.33
>> 0.00 0.00 0.00 0.00
>> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>> 0.00 0.00 0.00 0.00
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 0.00 0.00 1.25 86.03 0.00 12.72
>>
>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
>> avgqu-sz await svctm %util
>> sda 0.00 0.50 0.00 1.50 0.00 6.25 8.33
>> 0.00 1.33 0.67 0.10
>> sdb 0.00 171.00 0.00 238.50 0.00 2164.00 18.15
>> 320.17 3555.60 4.20 100.10
>> sdc 0.00 172.00 0.00 195.50 0.00 1776.00 18.17
>> 372.72 3696.45 5.12 100.10
>> sdd 0.00 188.50 0.00 144.50 0.00 1318.00 18.24
>> 528.15 3935.08 6.93 100.10
>> md0 0.00 0.00 0.00 1.50 0.00 6.00 8.00
>> 0.00 0.00 0.00 0.00
>> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>> 0.00 0.00 0.00 0.00
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 0.00 0.00 0.75 73.50 0.00 25.75
>>
>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
>> avgqu-sz await svctm %util
>> sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>> 0.00 0.00 0.00 0.00
>> sdb 0.50 67.16 1.49 181.59 7.96 1564.18 17.17
>> 119.48 1818.11 4.61 84.48
>> sdc 0.50 70.65 1.99 177.11 9.95 1588.06 17.84
>> 232.45 2844.31 5.56 99.60
>> sdd 0.00 77.11 1.49 149.75 5.97 1371.14 18.21
>> 484.19 4728.82 6.59 99.60
>> md0 0.00 0.00 0.00 1.99 0.00 11.94 12.00
>> 0.00 0.00 0.00 0.00
>> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>> 0.00 0.00 0.00 0.00
>>
>>
>> When stracing firefox (on the client), during its stall period I see
>> multi-second stalls in the open,close and unlink system calls. e.g
>>
>> open("/home/andrew/.mozilla/firefox/default.d9m/Cache/1A190CD5d01",
>> O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 39 <8.239256>
>> close(39) = 0 <1.125843>
>>
>> When its behaving I get numbers more like:
>>
>> open("/home/andrew/.mozilla/firefox/default.d9m/sessionstore-1.js",
>> O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0600) = 39 <0.008773>
>> close(39) = 0 <0.265877>
>>
>> Not the same file but sessionstore-1.js is 56K and 1A190CD5d01 is 37K
>>
>> vim also has noticeble stalls, probably when it is doing its swap file
>> thing.
>>
>> My music is stored on the server and it never seems to be affected (player
>> accessing the files straight over nfs).
>>
>>
>> I have put up the current kernel config at
>>
>> http://digital-domain.net/kernel/sw-raid5-issue/config
>>
>> and the output of mdadm -D /dev/md0 at
>>
>> http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D
>>
>>
>> If anyone has any idea's I'm all ears.
>>
>>
>> Having just composed this message, I see this thread:
>> http://www.spinics.net/lists/raid/msg17190.html
>>
>> I do remember seeing a lot of pdflush activity (using blktrace) around the
>> times of the stalls, but I don't seem to get the high cpu usage.
>>
>>
>> Cheers,
>>
>> Andrew
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 9:53 RAID 5 performance issue Andrew Clayton
2007-10-03 16:43 ` Justin Piszcz
@ 2007-10-03 17:53 ` Goswin von Brederlow
2007-10-03 20:20 ` Andrew Clayton
2007-10-05 20:25 ` Brendan Conoboy
2 siblings, 1 reply; 56+ messages in thread
From: Goswin von Brederlow @ 2007-10-03 17:53 UTC (permalink / raw)
To: Andrew Clayton; +Cc: linux-raid
Andrew Clayton <andrew@pccl.info> writes:
> Hi,
>
> Hardware:
>
> Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file system) is connected to the onboard Silicon Image 3114 controller. The other 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 card. I moved the 3 raid disks off the on board controller onto the card the other day to see if that would help, it didn't.
I would think the onboard controller is connected to the north or
south bridge and possibly hooked directly into the hyper
transport. The extra controler is PCI so you are limited to
theoretical 128MiB/s. For me the onboard chips do much better (though
at higher cpu cost) than pci cards.
MfG
Goswin
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 16:48 ` Justin Piszcz
@ 2007-10-03 20:10 ` Andrew Clayton
2007-10-03 20:16 ` Justin Piszcz
2007-10-06 12:30 ` Justin Piszcz
0 siblings, 2 replies; 56+ messages in thread
From: Andrew Clayton @ 2007-10-03 20:10 UTC (permalink / raw)
To: Justin Piszcz; +Cc: Andrew Clayton, linux-raid
On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote:
> Also if it is software raid, when you make the XFS filesyste, on it,
> it sets up a proper (and tuned) sunit/swidth, so why would you want
> to change that?
Oh I didn't, the sunit and swidth were set automatically. Do they look
sane?. From reading the XFS section of the mount man page, I'm not
entirely sure what they specify and certainly wouldn't have any idea
what to set them to.
> Justin.
Cheers,
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 20:10 ` Andrew Clayton
@ 2007-10-03 20:16 ` Justin Piszcz
2007-10-06 12:30 ` Justin Piszcz
1 sibling, 0 replies; 56+ messages in thread
From: Justin Piszcz @ 2007-10-03 20:16 UTC (permalink / raw)
To: Andrew Clayton; +Cc: Andrew Clayton, linux-raid
On Wed, 3 Oct 2007, Andrew Clayton wrote:
> On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote:
>
>> Also if it is software raid, when you make the XFS filesyste, on it,
>> it sets up a proper (and tuned) sunit/swidth, so why would you want
>> to change that?
>
> Oh I didn't, the sunit and swidth were set automatically. Do they look
> sane?. From reading the XFS section of the mount man page, I'm not
> entirely sure what they specify and certainly wouldn't have any idea
> what to set them to.
>
>> Justin.
>
> Cheers,
>
> Andrew
>
You should not need to set them as mount options unless you are overriding
the defaults.
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 16:43 ` Justin Piszcz
2007-10-03 16:48 ` Justin Piszcz
@ 2007-10-03 20:19 ` Andrew Clayton
2007-10-03 20:35 ` Justin Piszcz
2007-10-03 20:36 ` David Rees
1 sibling, 2 replies; 56+ messages in thread
From: Andrew Clayton @ 2007-10-03 20:19 UTC (permalink / raw)
To: Justin Piszcz; +Cc: Andrew Clayton, linux-raid
On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote:
> Have you checked fragmentation?
You know, that never even occurred to me. I've gotten into the mind set
that it's generally not a problem under Linux.
> xfs_db -c frag -f /dev/md3
>
> What does this report?
# xfs_db -c frag -f /dev/md0
actual 1828276, ideal 1708782, fragmentation factor 6.54%
Good or bad?
Seeing as this filesystem will be three years old in December, that
doesn't seem overly bad.
I'm currently looking to things like
http://lwn.net/Articles/249450/ and
http://lwn.net/Articles/242559/
for potential help, fortunately it seems I won't have too long to wait.
> Justin.
Cheers,
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 17:53 ` Goswin von Brederlow
@ 2007-10-03 20:20 ` Andrew Clayton
2007-10-03 20:48 ` Richard Scobie
0 siblings, 1 reply; 56+ messages in thread
From: Andrew Clayton @ 2007-10-03 20:20 UTC (permalink / raw)
To: Goswin von Brederlow; +Cc: Andrew Clayton, linux-raid
On Wed, 03 Oct 2007 19:53:08 +0200, Goswin von Brederlow wrote:
> Andrew Clayton <andrew@pccl.info> writes:
>
> > Hi,
> >
> > Hardware:
> >
> > Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1
> > (root file system) is connected to the onboard Silicon Image 3114
> > controller. The other 3 (/home) are in a software RAID 5 connected
> > to a PCI Silicon Image 3124 card. I moved the 3 raid disks off the
> > on board controller onto the card the other day to see if that
> > would help, it didn't.
>
> I would think the onboard controller is connected to the north or
> south bridge and possibly hooked directly into the hyper
> transport. The extra controler is PCI so you are limited to
> theoretical 128MiB/s. For me the onboard chips do much better (though
> at higher cpu cost) than pci cards.
Yeah, I was wondering about that. It certainly hasn't improved things,
it's unclear if it's made things any worse..
> MfG
> Goswin
Cheers,
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 20:19 ` Andrew Clayton
@ 2007-10-03 20:35 ` Justin Piszcz
2007-10-03 20:46 ` Andrew Clayton
2007-10-03 20:36 ` David Rees
1 sibling, 1 reply; 56+ messages in thread
From: Justin Piszcz @ 2007-10-03 20:35 UTC (permalink / raw)
To: Andrew Clayton; +Cc: Andrew Clayton, linux-raid
What does cat /sys/block/md0/md/mismatch_cnt say?
That fragmentation looks normal/fine.
Justin.
On Wed, 3 Oct 2007, Andrew Clayton wrote:
> On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote:
>
>> Have you checked fragmentation?
>
> You know, that never even occurred to me. I've gotten into the mind set
> that it's generally not a problem under Linux.
>
>> xfs_db -c frag -f /dev/md3
>>
>> What does this report?
>
> # xfs_db -c frag -f /dev/md0
> actual 1828276, ideal 1708782, fragmentation factor 6.54%
>
> Good or bad?
>
> Seeing as this filesystem will be three years old in December, that
> doesn't seem overly bad.
>
>
> I'm currently looking to things like
>
> http://lwn.net/Articles/249450/ and
> http://lwn.net/Articles/242559/
>
> for potential help, fortunately it seems I won't have too long to wait.
>
>> Justin.
>
> Cheers,
>
> Andrew
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 20:19 ` Andrew Clayton
2007-10-03 20:35 ` Justin Piszcz
@ 2007-10-03 20:36 ` David Rees
2007-10-03 20:48 ` Andrew Clayton
2007-10-04 14:08 ` Andrew Clayton
1 sibling, 2 replies; 56+ messages in thread
From: David Rees @ 2007-10-03 20:36 UTC (permalink / raw)
To: Andrew Clayton; +Cc: Justin Piszcz, Andrew Clayton, linux-raid
On 10/3/07, Andrew Clayton <andrew@digital-domain.net> wrote:
> On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote:
> > Have you checked fragmentation?
>
> You know, that never even occurred to me. I've gotten into the mind set
> that it's generally not a problem under Linux.
It's probably not the root cause, but certainly doesn't help things.
At least with XFS you have an easy way to defrag the filesystem
without even taking it offline.
> # xfs_db -c frag -f /dev/md0
> actual 1828276, ideal 1708782, fragmentation factor 6.54%
>
> Good or bad?
Not bad, but not that good, either. Try running xfs_fsr into a nightly
cronjob. By default, it will defrag mounted xfs filesystems for up to
2 hours. Typically this is enough to keep fragmentation well below 1%.
-Dave
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 20:35 ` Justin Piszcz
@ 2007-10-03 20:46 ` Andrew Clayton
0 siblings, 0 replies; 56+ messages in thread
From: Andrew Clayton @ 2007-10-03 20:46 UTC (permalink / raw)
To: Justin Piszcz; +Cc: Andrew Clayton, linux-raid
On Wed, 3 Oct 2007 16:35:21 -0400 (EDT), Justin Piszcz wrote:
> What does cat /sys/block/md0/md/mismatch_cnt say?
$ cat /sys/block/md0/md/mismatch_cnt
0
> That fragmentation looks normal/fine.
Cool.
> Justin.
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 20:36 ` David Rees
@ 2007-10-03 20:48 ` Andrew Clayton
2007-10-04 14:08 ` Andrew Clayton
1 sibling, 0 replies; 56+ messages in thread
From: Andrew Clayton @ 2007-10-03 20:48 UTC (permalink / raw)
To: David Rees; +Cc: Justin Piszcz, Andrew Clayton, linux-raid
On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote:
> > # xfs_db -c frag -f /dev/md0
> > actual 1828276, ideal 1708782, fragmentation factor 6.54%
> >
> > Good or bad?
>
> Not bad, but not that good, either. Try running xfs_fsr into a nightly
> cronjob. By default, it will defrag mounted xfs filesystems for up to
> 2 hours. Typically this is enough to keep fragmentation well below 1%.
Worth a shot.
> -Dave
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 20:20 ` Andrew Clayton
@ 2007-10-03 20:48 ` Richard Scobie
0 siblings, 0 replies; 56+ messages in thread
From: Richard Scobie @ 2007-10-03 20:48 UTC (permalink / raw)
To: linux-raid
Andrew Clayton wrote:
> Yeah, I was wondering about that. It certainly hasn't improved things,
> it's unclear if it's made things any worse..
>
Many 3124 cards are PCI-X, so if you have one of these (and you seem to
be using a server board which may well have PCI-X), bus performance is
not going to be an issue.
Regards,
Richard
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 20:36 ` David Rees
2007-10-03 20:48 ` Andrew Clayton
@ 2007-10-04 14:08 ` Andrew Clayton
2007-10-04 14:09 ` Justin Piszcz
1 sibling, 1 reply; 56+ messages in thread
From: Andrew Clayton @ 2007-10-04 14:08 UTC (permalink / raw)
To: David Rees; +Cc: Justin Piszcz, Andrew Clayton, linux-raid
On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote:
> Not bad, but not that good, either. Try running xfs_fsr into a nightly
> cronjob. By default, it will defrag mounted xfs filesystems for up to
> 2 hours. Typically this is enough to keep fragmentation well below 1%.
I ran it last night on the raid array, it got the fragmentation down
to 1.07%. Unfortunately that doesn't seemed to have helped.
> -Dave
Cheers,
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 14:08 ` Andrew Clayton
@ 2007-10-04 14:09 ` Justin Piszcz
2007-10-04 14:10 ` Justin Piszcz
2007-10-04 14:36 ` Andrew Clayton
0 siblings, 2 replies; 56+ messages in thread
From: Justin Piszcz @ 2007-10-04 14:09 UTC (permalink / raw)
To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid
Is NCQ enabled on the drives?
On Thu, 4 Oct 2007, Andrew Clayton wrote:
> On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote:
>
>> Not bad, but not that good, either. Try running xfs_fsr into a nightly
>> cronjob. By default, it will defrag mounted xfs filesystems for up to
>> 2 hours. Typically this is enough to keep fragmentation well below 1%.
>
> I ran it last night on the raid array, it got the fragmentation down
> to 1.07%. Unfortunately that doesn't seemed to have helped.
>
>> -Dave
>
> Cheers,
>
> Andrew
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 14:09 ` Justin Piszcz
@ 2007-10-04 14:10 ` Justin Piszcz
2007-10-04 14:44 ` Andrew Clayton
2007-10-04 14:36 ` Andrew Clayton
1 sibling, 1 reply; 56+ messages in thread
From: Justin Piszcz @ 2007-10-04 14:10 UTC (permalink / raw)
To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid
On Thu, 4 Oct 2007, Justin Piszcz wrote:
> Is NCQ enabled on the drives?
>
> On Thu, 4 Oct 2007, Andrew Clayton wrote:
>
>> On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote:
>>
>>> Not bad, but not that good, either. Try running xfs_fsr into a nightly
>>> cronjob. By default, it will defrag mounted xfs filesystems for up to
>>> 2 hours. Typically this is enough to keep fragmentation well below 1%.
>>
>> I ran it last night on the raid array, it got the fragmentation down
>> to 1.07%. Unfortunately that doesn't seemed to have helped.
>>
>>> -Dave
>>
>> Cheers,
>>
>> Andrew
>>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Also, did performance just go to crap one day or was it gradual?
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 14:09 ` Justin Piszcz
2007-10-04 14:10 ` Justin Piszcz
@ 2007-10-04 14:36 ` Andrew Clayton
2007-10-04 14:39 ` Justin Piszcz
2007-10-04 14:39 ` Justin Piszcz
1 sibling, 2 replies; 56+ messages in thread
From: Andrew Clayton @ 2007-10-04 14:36 UTC (permalink / raw)
To: Justin Piszcz; +Cc: David Rees, Andrew Clayton, linux-raid
On Thu, 4 Oct 2007 10:09:22 -0400 (EDT), Justin Piszcz wrote:
> Is NCQ enabled on the drives?
I don't think the drives are capable of that. I don't seen any mention
of NCQ in dmesg.
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 14:36 ` Andrew Clayton
@ 2007-10-04 14:39 ` Justin Piszcz
2007-10-04 15:03 ` Andrew Clayton
2007-10-04 14:39 ` Justin Piszcz
1 sibling, 1 reply; 56+ messages in thread
From: Justin Piszcz @ 2007-10-04 14:39 UTC (permalink / raw)
To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid
On Thu, 4 Oct 2007, Andrew Clayton wrote:
> On Thu, 4 Oct 2007 10:09:22 -0400 (EDT), Justin Piszcz wrote:
>
>> Is NCQ enabled on the drives?
>
> I don't think the drives are capable of that. I don't seen any mention
> of NCQ in dmesg.
>
>
> Andrew
>
What type (make/model) of the drives?
True, the controller may not be able to do it either.
What types of disks/controllers again?
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 14:36 ` Andrew Clayton
2007-10-04 14:39 ` Justin Piszcz
@ 2007-10-04 14:39 ` Justin Piszcz
1 sibling, 0 replies; 56+ messages in thread
From: Justin Piszcz @ 2007-10-04 14:39 UTC (permalink / raw)
To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid
On Thu, 4 Oct 2007, Andrew Clayton wrote:
> On Thu, 4 Oct 2007 10:09:22 -0400 (EDT), Justin Piszcz wrote:
>
>> Is NCQ enabled on the drives?
>
> I don't think the drives are capable of that. I don't seen any mention
> of NCQ in dmesg.
>
>
> Andrew
>
BTW You may not see 'NCQ' in the kernel messages unless you enable AHCI.
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 14:10 ` Justin Piszcz
@ 2007-10-04 14:44 ` Andrew Clayton
2007-10-04 16:20 ` Justin Piszcz
2007-10-05 19:02 ` John Stoffel
0 siblings, 2 replies; 56+ messages in thread
From: Andrew Clayton @ 2007-10-04 14:44 UTC (permalink / raw)
To: Justin Piszcz; +Cc: David Rees, Andrew Clayton, linux-raid
On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote:
> Also, did performance just go to crap one day or was it gradual?
IIRC I just noticed one day that firefox and vim was stalling. That was
back in February/March I think. At the time the server was running a
2.6.18 kernel, since then I've tried a few kernels in between that and
currently 2.6.23-rc9
Something seems to be periodically causing a lot of activity that
max's out the stripe_cache for a few seconds (when I was trying
to look with blktrace, it seemed pdflush was doing a lot of activity
during this time).
What I had noticed just recently was when I was the only one doing IO
on the server (no NFS running and I was logged in at the console) even
just patching the kernel was crawling to a halt.
> Justin.
Cheers,
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 14:39 ` Justin Piszcz
@ 2007-10-04 15:03 ` Andrew Clayton
2007-10-04 16:19 ` Justin Piszcz
2007-10-04 16:46 ` Steve Cousins
0 siblings, 2 replies; 56+ messages in thread
From: Andrew Clayton @ 2007-10-04 15:03 UTC (permalink / raw)
To: Justin Piszcz; +Cc: David Rees, Andrew Clayton, linux-raid
On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote:
> What type (make/model) of the drives?
The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100
> True, the controller may not be able to do it either.
>
> What types of disks/controllers again?
The RAID disks are currently connected to a Silicon Image PCI card are
configured as a software RAID 5
03:04.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02)
Subsystem: Silicon Image, Inc. Unknown device 7124
Flags: bus master, stepping, 66MHz, medium devsel, latency 64, IRQ 16
Memory at feafec00 (64-bit, non-prefetchable) [size=128]
Memory at feaf0000 (64-bit, non-prefetchable) [size=32K]
I/O ports at bc00 [size=16]
Expansion ROM at fea00000 [disabled] [size=512K]
Capabilities: [64] Power Management version 2
Capabilities: [40] PCI-X non-bridge device
Capabilities: [54] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
The problem originated when the disks where connected to the on board
Silicon Image 3114 controller.
> Justin.
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 15:03 ` Andrew Clayton
@ 2007-10-04 16:19 ` Justin Piszcz
2007-10-04 19:01 ` Andrew Clayton
2007-10-04 16:46 ` Steve Cousins
1 sibling, 1 reply; 56+ messages in thread
From: Justin Piszcz @ 2007-10-04 16:19 UTC (permalink / raw)
To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid
On Thu, 4 Oct 2007, Andrew Clayton wrote:
> On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote:
>
>
>> What type (make/model) of the drives?
>
> The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100
>
>> True, the controller may not be able to do it either.
>>
>> What types of disks/controllers again?
>
> The RAID disks are currently connected to a Silicon Image PCI card are
> configured as a software RAID 5
>
> 03:04.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02)
> Subsystem: Silicon Image, Inc. Unknown device 7124
> Flags: bus master, stepping, 66MHz, medium devsel, latency 64, IRQ 16
> Memory at feafec00 (64-bit, non-prefetchable) [size=128]
> Memory at feaf0000 (64-bit, non-prefetchable) [size=32K]
> I/O ports at bc00 [size=16]
> Expansion ROM at fea00000 [disabled] [size=512K]
> Capabilities: [64] Power Management version 2
> Capabilities: [40] PCI-X non-bridge device
> Capabilities: [54] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
>
>
> The problem originated when the disks where connected to the on board
> Silicon Image 3114 controller.
>
>> Justin.
>
> Andrew
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
7K250
http://www.itreviews.co.uk/hardware/h912.htm
http://techreport.com/articles.x/8362
"The T7K250 also supports Native Command Queuing (NCQ)."
You need to enable AHCI in order to reap the benefits though.
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 14:44 ` Andrew Clayton
@ 2007-10-04 16:20 ` Justin Piszcz
2007-10-04 18:26 ` Andrew Clayton
2007-10-05 19:02 ` John Stoffel
1 sibling, 1 reply; 56+ messages in thread
From: Justin Piszcz @ 2007-10-04 16:20 UTC (permalink / raw)
To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid
On Thu, 4 Oct 2007, Andrew Clayton wrote:
> On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote:
>
>
>> Also, did performance just go to crap one day or was it gradual?
>
> IIRC I just noticed one day that firefox and vim was stalling. That was
> back in February/March I think. At the time the server was running a
> 2.6.18 kernel, since then I've tried a few kernels in between that and
> currently 2.6.23-rc9
>
> Something seems to be periodically causing a lot of activity that
> max's out the stripe_cache for a few seconds (when I was trying
> to look with blktrace, it seemed pdflush was doing a lot of activity
> during this time).
>
> What I had noticed just recently was when I was the only one doing IO
> on the server (no NFS running and I was logged in at the console) even
> just patching the kernel was crawling to a halt.
>
>> Justin.
>
> Cheers,
>
> Andrew
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Besides the NCQ issue your problem is a bit perpelxing..
Just out of curiosity have you run memtest86 for at least one pass to make
sure there were no problems with the memory?
Do you have a script showing all of the parameters that you use to
optimize the array?
Also mdadm -D /dev/md0 output please?
What distribution are you running? (not that it should matter, but just
curious)
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 15:03 ` Andrew Clayton
2007-10-04 16:19 ` Justin Piszcz
@ 2007-10-04 16:46 ` Steve Cousins
2007-10-04 17:06 ` Steve Cousins
2007-10-04 19:06 ` Andrew Clayton
1 sibling, 2 replies; 56+ messages in thread
From: Steve Cousins @ 2007-10-04 16:46 UTC (permalink / raw)
To: Andrew Clayton; +Cc: Justin Piszcz, David Rees, Andrew Clayton, linux-raid
Andrew Clayton wrote:
> On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote:
>
>
>
>> What type (make/model) of the drives?
>>
>
> The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100
>
A couple of things:
1. I thought you had SATA drives
2. ATA-6 would be UDMA/133
The SATA-1 versions of the 7K250's did not have NCQ. The SATA-2 versions
do have NCQ. If you do have SATA drives, are they SATA-1 or SATA-2?
Steve
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 16:46 ` Steve Cousins
@ 2007-10-04 17:06 ` Steve Cousins
2007-10-04 19:06 ` Andrew Clayton
1 sibling, 0 replies; 56+ messages in thread
From: Steve Cousins @ 2007-10-04 17:06 UTC (permalink / raw)
To: Andrew Clayton; +Cc: Justin Piszcz, David Rees, Andrew Clayton, linux-raid
Steve Cousins wrote:
> A couple of things:
>
> 1. I thought you had SATA drives
> 2. ATA-6 would be UDMA/133
Number 2 is not correct. Sorry about that.
Steve
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 16:20 ` Justin Piszcz
@ 2007-10-04 18:26 ` Andrew Clayton
2007-10-05 10:25 ` Justin Piszcz
0 siblings, 1 reply; 56+ messages in thread
From: Andrew Clayton @ 2007-10-04 18:26 UTC (permalink / raw)
To: Justin Piszcz; +Cc: David Rees, Andrew Clayton, linux-raid
On Thu, 4 Oct 2007 12:20:25 -0400 (EDT), Justin Piszcz wrote:
>
>
> On Thu, 4 Oct 2007, Andrew Clayton wrote:
>
> > On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote:
> >
> >
> >> Also, did performance just go to crap one day or was it gradual?
> >
> > IIRC I just noticed one day that firefox and vim was stalling. That
> > was back in February/March I think. At the time the server was
> > running a 2.6.18 kernel, since then I've tried a few kernels in
> > between that and currently 2.6.23-rc9
> >
> > Something seems to be periodically causing a lot of activity that
> > max's out the stripe_cache for a few seconds (when I was trying
> > to look with blktrace, it seemed pdflush was doing a lot of activity
> > during this time).
> >
> > What I had noticed just recently was when I was the only one doing
> > IO on the server (no NFS running and I was logged in at the
> > console) even just patching the kernel was crawling to a halt.
> >
> >> Justin.
> >
> > Cheers,
> >
> > Andrew
> > -
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-raid" in the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> Besides the NCQ issue your problem is a bit perpelxing..
>
> Just out of curiosity have you run memtest86 for at least one pass to
> make sure there were no problems with the memory?
No I haven't.
> Do you have a script showing all of the parameters that you use to
> optimize the array?
No script, Nothing that I change really seems to make any difference.
Currently I have set
/sys/block/md0/md/stripe_cache_size set at 16384
It doesn't really seem to matter what I set it to, as the
stripe_cache_active will periodically reach that value and take a few
seconds to come back down.
/sys/block/sd[bcd]/queue/nr_requests to 512
and set readhead to 8192 on sd[bcd]
But none of that really seems to make any difference.
> Also mdadm -D /dev/md0 output please?
http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D
> What distribution are you running? (not that it should matter, but
> just curious)
Fedora Core 6 (though I'm fairly sure it was happening before
upgrading from Fedora Core 5)
The iostat output of the drives when the problem occurs looks like the
same profile as when the backup is going onto the USB 1.1 hard drive.
The IO wait goes up, the cpu % is hitting 100% and we see multi second
await times. Which is why I thought maybe the on board controller was a
bottleneck, like the USB 1.1 is really slow and moved the disks onto
the PCI card. But when I saw that even patching the kernel was going
really slow I thought it can't really be the problem as it didn't used
to go that slow.
It's a tricky one...
> Justin.
Cheers,
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 16:19 ` Justin Piszcz
@ 2007-10-04 19:01 ` Andrew Clayton
0 siblings, 0 replies; 56+ messages in thread
From: Andrew Clayton @ 2007-10-04 19:01 UTC (permalink / raw)
To: Justin Piszcz; +Cc: David Rees, Andrew Clayton, linux-raid
On Thu, 4 Oct 2007 12:19:20 -0400 (EDT), Justin Piszcz wrote:
>
> 7K250
>
> http://www.itreviews.co.uk/hardware/h912.htm
>
> http://techreport.com/articles.x/8362
> "The T7K250 also supports Native Command Queuing (NCQ)."
>
> You need to enable AHCI in order to reap the benefits though.
Cheers, I'll take a look at that.
> Justin.
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 16:46 ` Steve Cousins
2007-10-04 17:06 ` Steve Cousins
@ 2007-10-04 19:06 ` Andrew Clayton
2007-10-05 10:20 ` Justin Piszcz
1 sibling, 1 reply; 56+ messages in thread
From: Andrew Clayton @ 2007-10-04 19:06 UTC (permalink / raw)
To: Steve Cousins; +Cc: Justin Piszcz, David Rees, Andrew Clayton, linux-raid
On Thu, 04 Oct 2007 12:46:05 -0400, Steve Cousins wrote:
> Andrew Clayton wrote:
> > On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote:
> >
> >
> > >> What type (make/model) of the drives?
> >> >
> > The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100
> > A couple of things:
>
> 1. I thought you had SATA drives
> 2. ATA-6 would be UDMA/133
>
> The SATA-1 versions of the 7K250's did not have NCQ. The SATA-2
> versions do have NCQ. If you do have SATA drives, are they SATA-1 or
> SATA-2?
Not sure, I suspect SATA 1 seeing as we've had them nearly 3 years.
Some bits from dmesg
ata1: SATA max UDMA/100 cmd 0xffffc20000aa4880 ctl 0xffffc20000aa488a
bmdma 0xff ffc20000aa4800 irq 19
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: ATA-6: HDS722525VLSA80, V36OA63A, max UDMA/100
ata1.00: 488397168 sectors, multi 16: LBA48
ata1.00: configured for UDMA/100
> Steve
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 19:06 ` Andrew Clayton
@ 2007-10-05 10:20 ` Justin Piszcz
0 siblings, 0 replies; 56+ messages in thread
From: Justin Piszcz @ 2007-10-05 10:20 UTC (permalink / raw)
To: Andrew Clayton; +Cc: Steve Cousins, David Rees, Andrew Clayton, linux-raid
On Thu, 4 Oct 2007, Andrew Clayton wrote:
> On Thu, 04 Oct 2007 12:46:05 -0400, Steve Cousins wrote:
>
>> Andrew Clayton wrote:
>>> On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote:
>>>
>>>
>>> >> What type (make/model) of the drives?
>>>> >
>>> The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100
>>> A couple of things:
>>
>> 1. I thought you had SATA drives
>> 2. ATA-6 would be UDMA/133
>>
>> The SATA-1 versions of the 7K250's did not have NCQ. The SATA-2
>> versions do have NCQ. If you do have SATA drives, are they SATA-1 or
>> SATA-2?
>
> Not sure, I suspect SATA 1 seeing as we've had them nearly 3 years.
>
> Some bits from dmesg
>
> ata1: SATA max UDMA/100 cmd 0xffffc20000aa4880 ctl 0xffffc20000aa488a
> bmdma 0xff ffc20000aa4800 irq 19
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata1.00: ATA-6: HDS722525VLSA80, V36OA63A, max UDMA/100
> ata1.00: 488397168 sectors, multi 16: LBA48
> ata1.00: configured for UDMA/100
>
>> Steve
>
> Andrew
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Looks like SATA1 (non-ncq) to me.
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 18:26 ` Andrew Clayton
@ 2007-10-05 10:25 ` Justin Piszcz
2007-10-05 10:57 ` Andrew Clayton
0 siblings, 1 reply; 56+ messages in thread
From: Justin Piszcz @ 2007-10-05 10:25 UTC (permalink / raw)
To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid
On Thu, 4 Oct 2007, Andrew Clayton wrote:
> On Thu, 4 Oct 2007 12:20:25 -0400 (EDT), Justin Piszcz wrote:
>
>>
>>
>> On Thu, 4 Oct 2007, Andrew Clayton wrote:
>>
>>> On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote:
>>>
>>>
>>>> Also, did performance just go to crap one day or was it gradual?
>>>
>>> IIRC I just noticed one day that firefox and vim was stalling. That
>>> was back in February/March I think. At the time the server was
>>> running a 2.6.18 kernel, since then I've tried a few kernels in
>>> between that and currently 2.6.23-rc9
>>>
>>> Something seems to be periodically causing a lot of activity that
>>> max's out the stripe_cache for a few seconds (when I was trying
>>> to look with blktrace, it seemed pdflush was doing a lot of activity
>>> during this time).
>>>
>>> What I had noticed just recently was when I was the only one doing
>>> IO on the server (no NFS running and I was logged in at the
>>> console) even just patching the kernel was crawling to a halt.
>>>
>>>> Justin.
>>>
>>> Cheers,
>>>
>>> Andrew
>>> -
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-raid" in the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>> Besides the NCQ issue your problem is a bit perpelxing..
>>
>> Just out of curiosity have you run memtest86 for at least one pass to
>> make sure there were no problems with the memory?
>
> No I haven't.
>
>> Do you have a script showing all of the parameters that you use to
>> optimize the array?
>
> No script, Nothing that I change really seems to make any difference.
>
> Currently I have set
>
> /sys/block/md0/md/stripe_cache_size set at 16384
>
> It doesn't really seem to matter what I set it to, as the
> stripe_cache_active will periodically reach that value and take a few
> seconds to come back down.
>
> /sys/block/sd[bcd]/queue/nr_requests to 512
>
> and set readhead to 8192 on sd[bcd]
>
> But none of that really seems to make any difference.
>
>> Also mdadm -D /dev/md0 output please?
>
> http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D
>
>> What distribution are you running? (not that it should matter, but
>> just curious)
>
> Fedora Core 6 (though I'm fairly sure it was happening before
> upgrading from Fedora Core 5)
>
> The iostat output of the drives when the problem occurs looks like the
> same profile as when the backup is going onto the USB 1.1 hard drive.
> The IO wait goes up, the cpu % is hitting 100% and we see multi second
> await times. Which is why I thought maybe the on board controller was a
> bottleneck, like the USB 1.1 is really slow and moved the disks onto
> the PCI card. But when I saw that even patching the kernel was going
> really slow I thought it can't really be the problem as it didn't used
> to go that slow.
>
> It's a tricky one...
>
>> Justin.
>
> Cheers,
>
> Andrew
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
So you have 3 SATA 1 disks:
http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D
Do you compile your own kernel or use the distribution's kernel?
What does cat /proc/interrupts say? This is important to see if your disk
controller(s) are sharing IRQs with other devices.
Also note with only 3 disks in a RAID-5 you will not get stellar
performance, but regardless, it should not be 'hanging' as you have
mentioned. Just out of sheer curiosity have you tried the AS scheduler?
CFQ is supposed to be better for multi-user performance but I would be
highly interested if you used the AS scheduler-- would that change the
'hanging' problem you are noticing? I would give it a shot, also try the
deadline and noop.
You probably want to keep the nr_requessts to 128, the stripe_cache_size
to 8mb. The stripe size of 256k is probably optimal.
Did you also re-mount the XFS partition with the default mount options (or
just take the sunit and swidth)?
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 10:25 ` Justin Piszcz
@ 2007-10-05 10:57 ` Andrew Clayton
2007-10-05 11:08 ` Justin Piszcz
0 siblings, 1 reply; 56+ messages in thread
From: Andrew Clayton @ 2007-10-05 10:57 UTC (permalink / raw)
To: Justin Piszcz; +Cc: David Rees, Andrew Clayton, linux-raid
On Fri, 5 Oct 2007 06:25:20 -0400 (EDT), Justin Piszcz wrote:
> So you have 3 SATA 1 disks:
Yeah, 3 of them in the array, there is a fourth standalone disk which
contains the root fs from which the system boots..
> http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D
>
> Do you compile your own kernel or use the distribution's kernel?
Compile my own.
> What does cat /proc/interrupts say? This is important to see if your
> disk controller(s) are sharing IRQs with other devices.
$ cat /proc/interrupts
CPU0 CPU1
0: 132052 249369403 IO-APIC-edge timer
1: 202 52 IO-APIC-edge i8042
8: 0 1 IO-APIC-edge rtc
9: 0 0 IO-APIC-fasteoi acpi
14: 11483 172 IO-APIC-edge ide0
16: 18041195 4798850 IO-APIC-fasteoi sata_sil24
18: 86068930 27 IO-APIC-fasteoi eth0
19: 16127662 2138177 IO-APIC-fasteoi sata_sil, ohci_hcd:usb1, ohci_hcd:usb2
NMI: 0 0
LOC: 249368914 249368949
ERR: 0
sata_sil24 contains the raid array, sata_sil the root fs disk
>
> Also note with only 3 disks in a RAID-5 you will not get stellar
> performance, but regardless, it should not be 'hanging' as you have
> mentioned. Just out of sheer curiosity have you tried the AS
> scheduler? CFQ is supposed to be better for multi-user performance
> but I would be highly interested if you used the AS scheduler-- would
> that change the 'hanging' problem you are noticing? I would give it
> a shot, also try the deadline and noop.
I did try them briefly. I'll have another go.
> You probably want to keep the nr_requessts to 128, the
> stripe_cache_size to 8mb. The stripe size of 256k is probably
> optimal.
OK.
> Did you also re-mount the XFS partition with the default mount
> options (or just take the sunit and swidth)?
The /etc/fstab entry for the raid array is currently:
/dev/md0 /home xfs
noatime,logbufs=8 1 2
and mount says
/dev/md0 on /home type xfs (rw,noatime,logbufs=8)
and /proc/mounts
/dev/md0 /home xfs rw,noatime,logbufs=8,sunit=512,swidth=1024 0 0
So I guess mount or the kernel is setting the sunit and swidth values.
> Justin.
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 10:57 ` Andrew Clayton
@ 2007-10-05 11:08 ` Justin Piszcz
2007-10-05 12:53 ` Andrew Clayton
0 siblings, 1 reply; 56+ messages in thread
From: Justin Piszcz @ 2007-10-05 11:08 UTC (permalink / raw)
To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid
On Fri, 5 Oct 2007, Andrew Clayton wrote:
> On Fri, 5 Oct 2007 06:25:20 -0400 (EDT), Justin Piszcz wrote:
>
>
>> So you have 3 SATA 1 disks:
>
> Yeah, 3 of them in the array, there is a fourth standalone disk which
> contains the root fs from which the system boots..
>
>> http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D
>>
>> Do you compile your own kernel or use the distribution's kernel?
>
> Compile my own.
>
>> What does cat /proc/interrupts say? This is important to see if your
>> disk controller(s) are sharing IRQs with other devices.
>
> $ cat /proc/interrupts
> CPU0 CPU1
> 0: 132052 249369403 IO-APIC-edge timer
> 1: 202 52 IO-APIC-edge i8042
> 8: 0 1 IO-APIC-edge rtc
> 9: 0 0 IO-APIC-fasteoi acpi
> 14: 11483 172 IO-APIC-edge ide0
> 16: 18041195 4798850 IO-APIC-fasteoi sata_sil24
> 18: 86068930 27 IO-APIC-fasteoi eth0
> 19: 16127662 2138177 IO-APIC-fasteoi sata_sil, ohci_hcd:usb1, ohci_hcd:usb2
> NMI: 0 0
> LOC: 249368914 249368949
> ERR: 0
>
>
> sata_sil24 contains the raid array, sata_sil the root fs disk
>
>>
>> Also note with only 3 disks in a RAID-5 you will not get stellar
>> performance, but regardless, it should not be 'hanging' as you have
>> mentioned. Just out of sheer curiosity have you tried the AS
>> scheduler? CFQ is supposed to be better for multi-user performance
>> but I would be highly interested if you used the AS scheduler-- would
>> that change the 'hanging' problem you are noticing? I would give it
>> a shot, also try the deadline and noop.
>
> I did try them briefly. I'll have another go.
>
>> You probably want to keep the nr_requessts to 128, the
>> stripe_cache_size to 8mb. The stripe size of 256k is probably
>> optimal.
>
> OK.
>
>> Did you also re-mount the XFS partition with the default mount
>> options (or just take the sunit and swidth)?
>
> The /etc/fstab entry for the raid array is currently:
>
> /dev/md0 /home xfs
> noatime,logbufs=8 1 2
>
> and mount says
>
> /dev/md0 on /home type xfs (rw,noatime,logbufs=8)
>
> and /proc/mounts
>
> /dev/md0 /home xfs rw,noatime,logbufs=8,sunit=512,swidth=1024 0 0
>
> So I guess mount or the kernel is setting the sunit and swidth values.
>
>> Justin.
>
>
> Andrew
>
The mount options are from when the filesystem was made for sunit/swidth I
believe.
-N Causes the file system parameters to be printed out without
really creating the file system.
You should be able to run mkfs.xfs -N /dev/md0 to get that information.
/dev/md3 /r1 xfs noatime,nodiratime,logbufs=8,logbsize=262144 0 1
Try using the following options and the AS scheduler and let me know if
you still notice any 'hangs'
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 11:08 ` Justin Piszcz
@ 2007-10-05 12:53 ` Andrew Clayton
2007-10-05 13:18 ` Justin Piszcz
2007-10-05 13:30 ` Andrew Clayton
0 siblings, 2 replies; 56+ messages in thread
From: Andrew Clayton @ 2007-10-05 12:53 UTC (permalink / raw)
To: Justin Piszcz; +Cc: David Rees, Andrew Clayton, linux-raid
On Fri, 5 Oct 2007 07:08:51 -0400 (EDT), Justin Piszcz wrote:
> The mount options are from when the filesystem was made for
> sunit/swidth I believe.
>
> -N Causes the file system parameters to be printed
> out without really creating the file system.
>
> You should be able to run mkfs.xfs -N /dev/md0 to get that
> information.
Can't do it while it's mounted. would xfs_info show the same stuff?
> /dev/md3 /r1 xfs
> noatime,nodiratime,logbufs=8,logbsize=262144 0 1
>
> Try using the following options and the AS scheduler and let me know
> if you still notice any 'hangs'
OK, I've remounted (mount -o remount) with those options.
I've set the strip_cache_size to 8192
I've set the nr_requests back to 128
I've set the schedulers to anticipatory.
Unfortunately problem remains.
I'll try the noop scheduler as I don't think I ever tried that one.
> Justin.
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 12:53 ` Andrew Clayton
@ 2007-10-05 13:18 ` Justin Piszcz
2007-10-05 13:30 ` Andrew Clayton
1 sibling, 0 replies; 56+ messages in thread
From: Justin Piszcz @ 2007-10-05 13:18 UTC (permalink / raw)
To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid
On Fri, 5 Oct 2007, Andrew Clayton wrote:
> On Fri, 5 Oct 2007 07:08:51 -0400 (EDT), Justin Piszcz wrote:
>
>> The mount options are from when the filesystem was made for
>> sunit/swidth I believe.
>>
>> -N Causes the file system parameters to be printed
>> out without really creating the file system.
>>
>> You should be able to run mkfs.xfs -N /dev/md0 to get that
>> information.
>
> Can't do it while it's mounted. would xfs_info show the same stuff?
>
>> /dev/md3 /r1 xfs
>> noatime,nodiratime,logbufs=8,logbsize=262144 0 1
>>
>> Try using the following options and the AS scheduler and let me know
>> if you still notice any 'hangs'
>
> OK, I've remounted (mount -o remount) with those options.
> I've set the strip_cache_size to 8192
> I've set the nr_requests back to 128
> I've set the schedulers to anticipatory.
>
> Unfortunately problem remains.
>
> I'll try the noop scheduler as I don't think I ever tried that one.
>
>> Justin.
>
> Andrew
>
How are you measuring the problem? How can it be reproduced?
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 12:53 ` Andrew Clayton
2007-10-05 13:18 ` Justin Piszcz
@ 2007-10-05 13:30 ` Andrew Clayton
2007-10-05 14:07 ` Justin Piszcz
1 sibling, 1 reply; 56+ messages in thread
From: Andrew Clayton @ 2007-10-05 13:30 UTC (permalink / raw)
To: Andrew Clayton; +Cc: Justin Piszcz, David Rees, linux-raid
On Fri, 5 Oct 2007 13:53:12 +0100, Andrew Clayton wrote:
> Unfortunately problem remains.
>
> I'll try the noop scheduler as I don't think I ever tried that one.
Didn't help either, oh well.
If I hit the disk in workstation with a big dd then in iostat I see it
maxing out at about 40MB/sec with > 1 second await. The server seems to
hit this with a much lower rate, < 10MB/sec maybe
I think I'm going to also move the raid disks back onto the onboard
controller (as Goswin von Brederlow said it should have more bandwidth
anyway) as the PCI card doesn't seem to have helped and I'm seeing soft
SATA resets coming from it.
e.g
ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata6.00: irq_stat 0x00020002, device error via D2H FIS
ata6.00: cmd 35/00:00:07:4a:d9/00:04:02:00:00/e0 tag 0 cdb 0x0 data 524288 out
res 51/84:00:06:4e:d9/00:00:02:00:00/e2 Emask 0x10 (ATA bus error)
ata6: soft resetting port
ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata6.00: configured for UDMA/100
ata6: EH complete
Just to confirm, I was seeing the problem with the on board controller
and thought moving the disks to the PCI card might help (at £35 it was
worth a shot!)
Cheers,
Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 13:30 ` Andrew Clayton
@ 2007-10-05 14:07 ` Justin Piszcz
2007-10-05 14:32 ` Andrew Clayton
2007-10-05 16:10 ` Andrew Clayton
0 siblings, 2 replies; 56+ messages in thread
From: Justin Piszcz @ 2007-10-05 14:07 UTC (permalink / raw)
To: Andrew Clayton; +Cc: Andrew Clayton, David Rees, linux-raid
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1590 bytes --]
On Fri, 5 Oct 2007, Andrew Clayton wrote:
> On Fri, 5 Oct 2007 13:53:12 +0100, Andrew Clayton wrote:
>
>
>> Unfortunately problem remains.
>>
>> I'll try the noop scheduler as I don't think I ever tried that one.
>
> Didn't help either, oh well.
>
> If I hit the disk in workstation with a big dd then in iostat I see it
> maxing out at about 40MB/sec with > 1 second await. The server seems to
> hit this with a much lower rate, < 10MB/sec maybe
>
> I think I'm going to also move the raid disks back onto the onboard
> controller (as Goswin von Brederlow said it should have more bandwidth
> anyway) as the PCI card doesn't seem to have helped and I'm seeing soft
> SATA resets coming from it.
>
> e.g
>
> ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
> ata6.00: irq_stat 0x00020002, device error via D2H FIS
> ata6.00: cmd 35/00:00:07:4a:d9/00:04:02:00:00/e0 tag 0 cdb 0x0 data 524288 out
> res 51/84:00:06:4e:d9/00:00:02:00:00/e2 Emask 0x10 (ATA bus error)
> ata6: soft resetting port
> ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata6.00: configured for UDMA/100
> ata6: EH complete
>
>
> Just to confirm, I was seeing the problem with the on board controller
> and thought moving the disks to the PCI card might help (at £35 it was
> worth a shot!)
>
> Cheers,
>
> Andrew
>
Yikes, yeah I would get them off the PCI card, what kind of motherboard is
it? If you don't have a PCI-e based board it probably won't help THAT
much but it still should be better than placing 3 drives on a PCI card.
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 14:07 ` Justin Piszcz
@ 2007-10-05 14:32 ` Andrew Clayton
2007-10-05 16:10 ` Andrew Clayton
1 sibling, 0 replies; 56+ messages in thread
From: Andrew Clayton @ 2007-10-05 14:32 UTC (permalink / raw)
To: Justin Piszcz; +Cc: Andrew Clayton, David Rees, linux-raid
On Fri, 5 Oct 2007 10:07:47 -0400 (EDT), Justin Piszcz wrote:
> Yikes, yeah I would get them off the PCI card, what kind of
> motherboard is it? If you don't have a PCI-e based board it probably
> won't help THAT much but it still should be better than placing 3
> drives on a PCI card.
It's a Tyan Thunder K8S Pro S2882. No PCIe. Though given the fact that
simply patching the kernel (on the RAID fs) when there's no other
disk activity slows to a crawl which I'm fairly sure it didn't used,
certainly these app stalls are new. The only trouble is I don't have
any iostat profile from say a year ago when everything was OK. So I
can't be 100% sure the current thing of spikes of iowait and await etc
didn't actually always happen and it's actually something else that's
wrong.
> Justin.
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 14:07 ` Justin Piszcz
2007-10-05 14:32 ` Andrew Clayton
@ 2007-10-05 16:10 ` Andrew Clayton
2007-10-05 16:16 ` Justin Piszcz
2007-10-05 18:58 ` Richard Scobie
1 sibling, 2 replies; 56+ messages in thread
From: Andrew Clayton @ 2007-10-05 16:10 UTC (permalink / raw)
To: Justin Piszcz; +Cc: Andrew Clayton, David Rees, linux-raid
On Fri, 5 Oct 2007 10:07:47 -0400 (EDT), Justin Piszcz wrote:
> Yikes, yeah I would get them off the PCI card, what kind of
> motherboard is it? If you don't have a PCI-e based board it probably
> won't help THAT much but it still should be better than placing 3
> drives on a PCI card.
Moved the drives back onto the on board controller.
While I had the machine down I ran memtest86+ for about 5 mins, no
errors.
I also got the output of mkfs.xfs -f -N /dev/md0
meta-data=/dev/md0 isize=256 agcount=16, agsize=7631168 blks
= sectsz=4096 attr=0
data = bsize=4096 blocks=122097920, imaxpct=25
= sunit=64 swidth=128 blks, unwritten=1
naming =version 2 bsize=4096
log =internal log bsize=4096 blocks=32768, version=2
= sectsz=4096 sunit=1 blks, lazy-count=0
realtime =none extsz=524288 blocks=0, rtextents=0
> Justin.
Thanks for your help by the way.
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 16:10 ` Andrew Clayton
@ 2007-10-05 16:16 ` Justin Piszcz
2007-10-05 19:33 ` Andrew Clayton
2007-10-05 18:58 ` Richard Scobie
1 sibling, 1 reply; 56+ messages in thread
From: Justin Piszcz @ 2007-10-05 16:16 UTC (permalink / raw)
To: Andrew Clayton; +Cc: Andrew Clayton, David Rees, linux-raid
On Fri, 5 Oct 2007, Andrew Clayton wrote:
> On Fri, 5 Oct 2007 10:07:47 -0400 (EDT), Justin Piszcz wrote:
>
>> Yikes, yeah I would get them off the PCI card, what kind of
>> motherboard is it? If you don't have a PCI-e based board it probably
>> won't help THAT much but it still should be better than placing 3
>> drives on a PCI card.
>
> Moved the drives back onto the on board controller.
>
> While I had the machine down I ran memtest86+ for about 5 mins, no
> errors.
>
> I also got the output of mkfs.xfs -f -N /dev/md0
>
> meta-data=/dev/md0 isize=256 agcount=16, agsize=7631168 blks
> = sectsz=4096 attr=0
> data = bsize=4096 blocks=122097920, imaxpct=25
> = sunit=64 swidth=128 blks, unwritten=1
> naming =version 2 bsize=4096
> log =internal log bsize=4096 blocks=32768, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=0
> realtime =none extsz=524288 blocks=0, rtextents=0
>
>> Justin.
>
> Thanks for your help by the way.
>
> Andrew
>
Hm, unfortunately at this point I think I am out of ideas you may need to
ask the XFS/linux-raid developers how to run blktrace during those
operations to figure out what is going on.
BTW: Last thing I can think of, did you make any changes to PREEMPTION in
the kernel, or do you disable it (SERVER)?
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 16:10 ` Andrew Clayton
2007-10-05 16:16 ` Justin Piszcz
@ 2007-10-05 18:58 ` Richard Scobie
2007-10-05 19:02 ` Justin Piszcz
1 sibling, 1 reply; 56+ messages in thread
From: Richard Scobie @ 2007-10-05 18:58 UTC (permalink / raw)
To: linux-raid
Have you had a look at the smartctl -a outputs of all the drives?
Possibly one drive is being slow to respond due to seek errors etc. but
I would perhaps expect to be seeing this in the log.
If you have a full backup and a spare drive, I would probably rotate it
through the array.
Regards,
Richard
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 18:58 ` Richard Scobie
@ 2007-10-05 19:02 ` Justin Piszcz
0 siblings, 0 replies; 56+ messages in thread
From: Justin Piszcz @ 2007-10-05 19:02 UTC (permalink / raw)
To: Richard Scobie; +Cc: linux-raid
On Sat, 6 Oct 2007, Richard Scobie wrote:
> Have you had a look at the smartctl -a outputs of all the drives?
>
> Possibly one drive is being slow to respond due to seek errors etc. but I
> would perhaps expect to be seeing this in the log.
>
> If you have a full backup and a spare drive, I would probably rotate it
> through the array.
>
> Regards,
>
> Richard
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Forgot about that, yeah post the smartctl -a output for each drive please.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-04 14:44 ` Andrew Clayton
2007-10-04 16:20 ` Justin Piszcz
@ 2007-10-05 19:02 ` John Stoffel
2007-10-05 19:42 ` Andrew Clayton
1 sibling, 1 reply; 56+ messages in thread
From: John Stoffel @ 2007-10-05 19:02 UTC (permalink / raw)
To: Andrew Clayton; +Cc: Justin Piszcz, David Rees, Andrew Clayton, linux-raid
>>>>> "Andrew" == Andrew Clayton <andrew@digital-domain.net> writes:
Andrew> On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote:
>> Also, did performance just go to crap one day or was it gradual?
Andrew> IIRC I just noticed one day that firefox and vim was
Andrew> stalling. That was back in February/March I think. At the time
Andrew> the server was running a 2.6.18 kernel, since then I've tried
Andrew> a few kernels in between that and currently 2.6.23-rc9
Andrew> Something seems to be periodically causing a lot of activity
Andrew> that max's out the stripe_cache for a few seconds (when I was
Andrew> trying to look with blktrace, it seemed pdflush was doing a
Andrew> lot of activity during this time).
Andrew> What I had noticed just recently was when I was the only one
Andrew> doing IO on the server (no NFS running and I was logged in at
Andrew> the console) even just patching the kernel was crawling to a
Andrew> halt.
How much memory does this system have? Have you checked the output of
/proc/mtrr at all? There' have been reports of systems with a bad
BIOS that gets the memory map wrong, causing access to memory to slow
down drastically.
So if you have 2gb of RAM, try booting with mem=1900m or something
like that and seeing if things are better for you.
Make sure your BIOS is upto the latest level as well.
John
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 16:16 ` Justin Piszcz
@ 2007-10-05 19:33 ` Andrew Clayton
0 siblings, 0 replies; 56+ messages in thread
From: Andrew Clayton @ 2007-10-05 19:33 UTC (permalink / raw)
To: Justin Piszcz; +Cc: Andrew Clayton, David Rees, linux-raid
On Fri, 5 Oct 2007 12:16:07 -0400 (EDT), Justin Piszcz wrote:
>
> Hm, unfortunately at this point I think I am out of ideas you may
> need to ask the XFS/linux-raid developers how to run blktrace during
> those operations to figure out what is going on.
No problem, cheers.
> BTW: Last thing I can think of, did you make any changes to
> PREEMPTION in the kernel, or do you disable it (SERVER)?
I normally have it disabled, but did try with voluntary preemption, but
with no effect.
> Justin.
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 19:02 ` John Stoffel
@ 2007-10-05 19:42 ` Andrew Clayton
2007-10-05 20:56 ` John Stoffel
0 siblings, 1 reply; 56+ messages in thread
From: Andrew Clayton @ 2007-10-05 19:42 UTC (permalink / raw)
To: John Stoffel; +Cc: Justin Piszcz, David Rees, Andrew Clayton, linux-raid
On Fri, 5 Oct 2007 15:02:22 -0400, John Stoffel wrote:
>
> How much memory does this system have? Have you checked the output of
2GB
> /proc/mtrr at all? There' have been reports of systems with a bad
$ cat /proc/mtrr
reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
> BIOS that gets the memory map wrong, causing access to memory to slow
> down drastically.
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data)
BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS)
BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
full dmesg (from 2.6.21-rc8-git2) at
http://digital-domain.net/kernel/sw-raid5-issue/dmesg
> So if you have 2gb of RAM, try booting with mem=1900m or something
Worth a shot.
> like that and seeing if things are better for you.
>
> Make sure your BIOS is upto the latest level as well.
Hmm, I'll see whats involved in that.
> John
Cheers,
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 9:53 RAID 5 performance issue Andrew Clayton
2007-10-03 16:43 ` Justin Piszcz
2007-10-03 17:53 ` Goswin von Brederlow
@ 2007-10-05 20:25 ` Brendan Conoboy
2007-10-06 0:38 ` Dean S. Messing
2 siblings, 1 reply; 56+ messages in thread
From: Brendan Conoboy @ 2007-10-05 20:25 UTC (permalink / raw)
To: Andrew Clayton; +Cc: linux-raid
Andrew Clayton wrote:
> If anyone has any idea's I'm all ears.
Hi Andrew,
Are you sure your drives are healthy? Try benchmarking each drive
individually and see if there is a dramatic performance difference
between any of them. One failing drive can slow down an entire array.
Only after you have determined that your drives are healthy when
accessed individually are combined results particularly meaningful. For
a generic SATA 1 drive you should expect a sustained raw read or write
in excess of 45 MB/s. Check both read and write (this will destroy
data) and make sure your cache is clear prior to the read test and after
the write test. If each drive is working at a reasonable rate
individually, you're ready to move on.
The next question is: What happens when you access more than one device
at the same time? You should either get nearly full combined
performance, max out CPU, or get throttled by bus bandwidth (An actual
kernel bug could also come into play here, but I tend to doubt it). Is
the onboard SATA controller real SATA or just an ATA-SATA converter? If
the latter, you're going to have trouble getting faster performance than
any one disk can give you at a time. The output of 'lspci' should tell
you if the onboard SATA controller is on its own bus or sharing space
with some other device. Pasting the output here would be useful.
Assuming you get good performance out of all 3 drives at the same time,
it's time to create a RAID 5 md device with the three, make sure your
parity is done building, then benchmark that. It's going to be slower
to write and a bit slower to read (especially if your CPU is maxed out),
but that is normal.
Assuming you get good performance out of your md device, it's time to
put your filesystem on the md device and benchmark that. If you use
ext3, remember to set the stride parameter per the raid howto. I am
unfamiliar with other fs/md interactions, so be sure to check.
If you're actually maxing out your bus bandwidth and the onboard sata
controller is on a different bus than the pci sata controller, try
balancing the drives between the two to get a larger combined pipe.
Good luck,
--
Brendan Conoboy / Red Hat, Inc. / blc@redhat.com
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 19:42 ` Andrew Clayton
@ 2007-10-05 20:56 ` John Stoffel
2007-10-07 17:22 ` Andrew Clayton
0 siblings, 1 reply; 56+ messages in thread
From: John Stoffel @ 2007-10-05 20:56 UTC (permalink / raw)
To: Andrew Clayton
Cc: John Stoffel, Justin Piszcz, David Rees, Andrew Clayton,
linux-raid
Andrew> On Fri, 5 Oct 2007 15:02:22 -0400, John Stoffel wrote:
>>
>> How much memory does this system have? Have you checked the output of
Andrew> 2GB
>> /proc/mtrr at all? There' have been reports of systems with a bad
Andrew> $ cat /proc/mtrr
Andrew> reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
That looks to be good, all the memory is there all in the same
region. Oh well... it was a thought.
>> BIOS that gets the memory map wrong, causing access to memory to slow
>> down drastically.
Andrew> BIOS-provided physical RAM map:
Andrew> BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
Andrew> BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
Andrew> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
Andrew> BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
Andrew> BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data)
Andrew> BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS)
Andrew> BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
I dunno about this part.
Andrew> full dmesg (from 2.6.21-rc8-git2) at
Andrew> http://digital-domain.net/kernel/sw-raid5-issue/dmesg
>> So if you have 2gb of RAM, try booting with mem=1900m or something
Andrew> Worth a shot.
It might make a difference, might not. Do you have any kernel
debugging options turned on? That might also be an issue. Check your
.config, there are a couple of options which drastically slow down the
system.
>> like that and seeing if things are better for you.
>>
>> Make sure your BIOS is upto the latest level as well.
Andrew> Hmm, I'll see whats involved in that.
At this point, I don't suspect the BIOS any more.
Can you start a 'vmstat 1' in one window, then start whatever you do
to get crappy performance. That would be interesting to see.
John
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 20:25 ` Brendan Conoboy
@ 2007-10-06 0:38 ` Dean S. Messing
2007-10-06 8:18 ` Justin Piszcz
0 siblings, 1 reply; 56+ messages in thread
From: Dean S. Messing @ 2007-10-06 0:38 UTC (permalink / raw)
To: linux-raid; +Cc: blc
Brendan Conoboy wrote:
<snip>
> Is the onboard SATA controller real SATA or just an ATA-SATA
> converter? If the latter, you're going to have trouble getting faster
> performance than any one disk can give you at a time. The output of
> 'lspci' should tell you if the onboard SATA controller is on its own
> bus or sharing space with some other device. Pasting the output here
> would be useful.
<snip>
N00bee question:
How does one tell if a machine's disk controller is an ATA-SATA
converter?
The output of `lspci|fgrep -i sata' is:
00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\
(rev 09)
suggests a real SATA. These references to ATA in "dmesg", however,
make me wonder.
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133
ata1.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133
ata2.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata2.00: configured for UDMA/133
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133
ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata3.00: configured for UDMA/133
Dean
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-06 0:38 ` Dean S. Messing
@ 2007-10-06 8:18 ` Justin Piszcz
2007-10-08 1:40 ` Dean S. Messing
0 siblings, 1 reply; 56+ messages in thread
From: Justin Piszcz @ 2007-10-06 8:18 UTC (permalink / raw)
To: Dean S. Messing; +Cc: linux-raid, blc
On Fri, 5 Oct 2007, Dean S. Messing wrote:
>
> Brendan Conoboy wrote:
> <snip>
>> Is the onboard SATA controller real SATA or just an ATA-SATA
>> converter? If the latter, you're going to have trouble getting faster
>> performance than any one disk can give you at a time. The output of
>> 'lspci' should tell you if the onboard SATA controller is on its own
>> bus or sharing space with some other device. Pasting the output here
>> would be useful.
> <snip>
>
> N00bee question:
>
> How does one tell if a machine's disk controller is an ATA-SATA
> converter?
>
> The output of `lspci|fgrep -i sata' is:
>
> 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\
> (rev 09)
>
> suggests a real SATA. These references to ATA in "dmesg", however,
> make me wonder.
>
> ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133
> ata1.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32)
> ata1.00: configured for UDMA/133
> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133
> ata2.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32)
> ata2.00: configured for UDMA/133
> ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133
> ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
> ata3.00: configured for UDMA/133
>
>
> Dean
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
His drives are either really old and do not support NCQ or he is not using
AHCI in the BIOS.
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-03 20:10 ` Andrew Clayton
2007-10-03 20:16 ` Justin Piszcz
@ 2007-10-06 12:30 ` Justin Piszcz
2007-10-06 16:05 ` Justin Piszcz
1 sibling, 1 reply; 56+ messages in thread
From: Justin Piszcz @ 2007-10-06 12:30 UTC (permalink / raw)
To: Andrew Clayton; +Cc: Andrew Clayton, linux-raid
On Wed, 3 Oct 2007, Andrew Clayton wrote:
> On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote:
>
>> Also if it is software raid, when you make the XFS filesyste, on it,
>> it sets up a proper (and tuned) sunit/swidth, so why would you want
>> to change that?
>
> Oh I didn't, the sunit and swidth were set automatically. Do they look
> sane?. From reading the XFS section of the mount man page, I'm not
> entirely sure what they specify and certainly wouldn't have any idea
> what to set them to.
>
>> Justin.
>
> Cheers,
>
> Andrew
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
As long as you ran mkfs.xfs /dev/md0 it should have optimized the
filesystem according to the disks beneath it.
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-06 12:30 ` Justin Piszcz
@ 2007-10-06 16:05 ` Justin Piszcz
0 siblings, 0 replies; 56+ messages in thread
From: Justin Piszcz @ 2007-10-06 16:05 UTC (permalink / raw)
To: Andrew Clayton; +Cc: Andrew Clayton, linux-raid
On Sat, 6 Oct 2007, Justin Piszcz wrote:
>
>
> On Wed, 3 Oct 2007, Andrew Clayton wrote:
>
>> On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote:
>>
>>> Also if it is software raid, when you make the XFS filesyste, on it,
>>> it sets up a proper (and tuned) sunit/swidth, so why would you want
>>> to change that?
>>
>> Oh I didn't, the sunit and swidth were set automatically. Do they look
>> sane?. From reading the XFS section of the mount man page, I'm not
>> entirely sure what they specify and certainly wouldn't have any idea
>> what to set them to.
>>
>>> Justin.
>>
>> Cheers,
>>
>> Andrew
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> As long as you ran mkfs.xfs /dev/md0 it should have optimized the filesystem
> according to the disks beneath it.
>
> Justin.
>
Also can you provide the smartctl -a /dev/sda
/dev/sdb
etc for each disk?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-05 20:56 ` John Stoffel
@ 2007-10-07 17:22 ` Andrew Clayton
2007-10-11 17:06 ` Bill Davidsen
0 siblings, 1 reply; 56+ messages in thread
From: Andrew Clayton @ 2007-10-07 17:22 UTC (permalink / raw)
Cc: John Stoffel, Justin Piszcz, David Rees, Andrew Clayton,
linux-raid
On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote:
> Can you start a 'vmstat 1' in one window, then start whatever you do
> to get crappy performance. That would be interesting to see.
In trying to find something simple that can show the problem I'm
seeing. I think I may have found the culprit.
Just testing on my machine at home, I made this simple program.
/* fslattest.c */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <string.h>
int main(int argc, char *argv[])
{
char file[255];
if (argc < 2) {
printf("Usage: fslattest file\n");
exit(1);
}
strncpy(file, argv[1], 254);
printf("Opening %s\n", file);
while (1) {
int testfd = open(file,
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600);
close(testfd);
unlink(file);
sleep(1);
}
exit(0);
}
If I run this program under strace in my home directory (XFS file system
on a (new) disk (no raid involved) all to its own.like
$ strace -T -e open ./fslattest test
It doesn't looks too bad.
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.005043>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000212>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.016844>
If I then start up a dd in the same place.
$ dd if=/dev/zero of=bigfile bs=1M count=500
Then I see the problem I'm seeing at work.
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <2.000348>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.594441>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <2.224636>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.074615>
Doing the same on my other disk which is Ext3 and contains the root fs,
it doesn't ever stutter
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.015423>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000092>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000093>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000088>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000103>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000096>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000094>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000114>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000091>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000274>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000107>
Somewhere in there was the dd, but you can't tell.
I've found if I mount the XFS filesystem with nobarrier, the
latency is reduced to about 0.5 seconds with occasional spikes > 1
second.
When doing this on the raid array.
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.009164>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000071>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.002667>
dd kicks in
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <11.580238>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <3.222294>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.888863>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <4.297978>
dd finishes
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000199>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.013413>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.025134>
I guess I should take this to the XFS folks.
> John
Cheers,
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-06 8:18 ` Justin Piszcz
@ 2007-10-08 1:40 ` Dean S. Messing
2007-10-08 8:44 ` Justin Piszcz
0 siblings, 1 reply; 56+ messages in thread
From: Dean S. Messing @ 2007-10-08 1:40 UTC (permalink / raw)
To: jpiszcz; +Cc: linux-raid, blc
Justin Piszcz wrote:
>On Fri, 5 Oct 2007, Dean S. Messing wrote:
>>
>> Brendan Conoboy wrote:
>> <snip>
>>> Is the onboard SATA controller real SATA or just an ATA-SATA
>>> converter? If the latter, you're going to have trouble getting faster
>>> performance than any one disk can give you at a time. The output of
>>> 'lspci' should tell you if the onboard SATA controller is on its own
>>> bus or sharing space with some other device. Pasting the output here
>>> would be useful.
>> <snip>
>>
>> N00bee question:
>>
>> How does one tell if a machine's disk controller is an ATA-SATA
>> converter?
>>
>> The output of `lspci|fgrep -i sata' is:
>>
>> 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\
>> (rev 09)
>>
>> suggests a real SATA. These references to ATA in "dmesg", however,
>> make me wonder.
>>
>> ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133
>> ata1.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32)
>> ata1.00: configured for UDMA/133
>> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133
>> ata2.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32)
>> ata2.00: configured for UDMA/133
>> ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133
>> ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>> ata3.00: configured for UDMA/133
>>
>>
>> Dean
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>His drives are either really old and do not support NCQ or he is not using
>AHCI in the BIOS.
Sorry, Justin, if I wasn't clear. I was asking the N00bee question
about _my_own_ machine. The output of lspci (on my machine) seems to
indicate I have a "real" STAT controller on the Motherboard, but the
contents of "dmesg", with the references to ATA-7 and UDMA/133, made
me wonder if I had just an ATA-SATA converter. Hence my question: how
does one tell definitively if one has a real SATA controller on the Mother
Board?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-08 1:40 ` Dean S. Messing
@ 2007-10-08 8:44 ` Justin Piszcz
0 siblings, 0 replies; 56+ messages in thread
From: Justin Piszcz @ 2007-10-08 8:44 UTC (permalink / raw)
To: Dean S. Messing; +Cc: linux-raid, blc
On Sun, 7 Oct 2007, Dean S. Messing wrote:
>
> Justin Piszcz wrote:
>> On Fri, 5 Oct 2007, Dean S. Messing wrote:
>>>
>>> Brendan Conoboy wrote:
>>> <snip>
>>>> Is the onboard SATA controller real SATA or just an ATA-SATA
>>>> converter? If the latter, you're going to have trouble getting faster
>>>> performance than any one disk can give you at a time. The output of
>>>> 'lspci' should tell you if the onboard SATA controller is on its own
>>>> bus or sharing space with some other device. Pasting the output here
>>>> would be useful.
>>> <snip>
>>>
>>> N00bee question:
>>>
>>> How does one tell if a machine's disk controller is an ATA-SATA
>>> converter?
>>>
>>> The output of `lspci|fgrep -i sata' is:
>>>
>>> 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\
>>> (rev 09)
>>>
>>> suggests a real SATA. These references to ATA in "dmesg", however,
>>> make me wonder.
>>>
>>> ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133
>>> ata1.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32)
>>> ata1.00: configured for UDMA/133
>>> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133
>>> ata2.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32)
>>> ata2.00: configured for UDMA/133
>>> ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133
>>> ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>>> ata3.00: configured for UDMA/133
>>>
>>>
>>> Dean
>>> -
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>> His drives are either really old and do not support NCQ or he is not using
>> AHCI in the BIOS.
>
> Sorry, Justin, if I wasn't clear. I was asking the N00bee question
> about _my_own_ machine. The output of lspci (on my machine) seems to
> indicate I have a "real" STAT controller on the Motherboard, but the
> contents of "dmesg", with the references to ATA-7 and UDMA/133, made
> me wonder if I had just an ATA-SATA converter. Hence my question: how
> does one tell definitively if one has a real SATA controller on the Mother
> Board?
>
The output looks like a real (AHCI-capable) SATA controller and your
drives are using NCQ/AHCI.
Output from one of my machines:
[ 23.621462] ata1: SATA max UDMA/133 cmd 0xf8812100 ctl 0x00000000 bmdma
0x00000000 irq 219
[ 24.078390] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 24.549806] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
As far as why it shows UDMA/133 in the kernel output I am sure there is a
reason :)
I know in the older SATA drives there was a bridge chip that was used to
convert the drive from IDE<->SATA maybe it is from those legacy days, not
sure.
With the newer NCQ/'native' SATA drives, the bridge chip should no longer
exist.
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-07 17:22 ` Andrew Clayton
@ 2007-10-11 17:06 ` Bill Davidsen
2007-10-11 18:07 ` Andrew Clayton
0 siblings, 1 reply; 56+ messages in thread
From: Bill Davidsen @ 2007-10-11 17:06 UTC (permalink / raw)
To: Andrew Clayton
Cc: John Stoffel, Justin Piszcz, David Rees, Andrew Clayton,
linux-raid
Andrew Clayton wrote:
> On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote:
>
>
>> Can you start a 'vmstat 1' in one window, then start whatever you do
>> to get crappy performance. That would be interesting to see.
>>
>
> In trying to find something simple that can show the problem I'm
> seeing. I think I may have found the culprit.
>
> Just testing on my machine at home, I made this simple program.
>
> /* fslattest.c */
>
> #define _GNU_SOURCE
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/stat.h>
> #include <sys/types.h>
> #include <fcntl.h>
> #include <string.h>
>
>
> int main(int argc, char *argv[])
> {
> char file[255];
>
> if (argc < 2) {
> printf("Usage: fslattest file\n");
> exit(1);
> }
>
> strncpy(file, argv[1], 254);
> printf("Opening %s\n", file);
>
> while (1) {
> int testfd = open(file,
> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600);
> close(testfd);
> unlink(file);
> sleep(1);
> }
>
> exit(0);
> }
>
>
> If I run this program under strace in my home directory (XFS file system
> on a (new) disk (no raid involved) all to its own.like
>
> $ strace -T -e open ./fslattest test
>
> It doesn't looks too bad.
>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.005043>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000212>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.016844>
>
> If I then start up a dd in the same place.
>
> $ dd if=/dev/zero of=bigfile bs=1M count=500
>
> Then I see the problem I'm seeing at work.
>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <2.000348>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.594441>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <2.224636>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.074615>
>
> Doing the same on my other disk which is Ext3 and contains the root fs,
> it doesn't ever stutter
>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.015423>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000092>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000093>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000088>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000103>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000096>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000094>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000114>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000091>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000274>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000107>
>
>
> Somewhere in there was the dd, but you can't tell.
>
> I've found if I mount the XFS filesystem with nobarrier, the
> latency is reduced to about 0.5 seconds with occasional spikes > 1
> second.
>
> When doing this on the raid array.
>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.009164>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000071>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.002667>
>
> dd kicks in
>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <11.580238>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <3.222294>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.888863>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <4.297978>
>
> dd finishes
>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000199>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.013413>
> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.025134>
>
>
> I guess I should take this to the XFS folks.
Try mounting the filesystem "noatime" and see if that's part of the problem.
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-11 17:06 ` Bill Davidsen
@ 2007-10-11 18:07 ` Andrew Clayton
2007-10-11 23:43 ` Justin Piszcz
0 siblings, 1 reply; 56+ messages in thread
From: Andrew Clayton @ 2007-10-11 18:07 UTC (permalink / raw)
To: Bill Davidsen
Cc: John Stoffel, Justin Piszcz, David Rees, Andrew Clayton,
linux-raid
On Thu, 11 Oct 2007 13:06:39 -0400, Bill Davidsen wrote:
> Andrew Clayton wrote:
> > On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote:
> >
> > >> Can you start a 'vmstat 1' in one window, then start whatever
> > >> you do
> >> to get crappy performance. That would be interesting to see.
> >> >
> > In trying to find something simple that can show the problem I'm
> > seeing. I think I may have found the culprit.
> >
> > Just testing on my machine at home, I made this simple program.
> >
> > /* fslattest.c */
> >
> > #define _GNU_SOURCE
> >
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <unistd.h>
> > #include <sys/stat.h>
> > #include <sys/types.h>
> > #include <fcntl.h>
> > #include <string.h>
> >
> >
> > int main(int argc, char *argv[])
> > {
> > char file[255];
> >
> > if (argc < 2) {
> > printf("Usage: fslattest file\n");
> > exit(1);
> > }
> >
> > strncpy(file, argv[1], 254);
> > printf("Opening %s\n", file);
> >
> > while (1) {
> > int testfd = open(file, >
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600); close(testfd);
> > unlink(file);
> > sleep(1);
> > }
> >
> > exit(0);
> > }
> >
> >
> > If I run this program under strace in my home directory (XFS file
> > system on a (new) disk (no raid involved) all to its own.like
> >
> > $ strace -T -e open ./fslattest test
> >
> > It doesn't looks too bad.
> >
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.005043> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000212>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.016844>
> >
> > If I then start up a dd in the same place.
> >
> > $ dd if=/dev/zero of=bigfile bs=1M count=500
> >
> > Then I see the problem I'm seeing at work.
> >
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <2.000348> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.594441>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <2.224636> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.074615>
> >
> > Doing the same on my other disk which is Ext3 and contains the root
> > fs, it doesn't ever stutter
> >
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.015423> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000092>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.000093> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000088>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.000103> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000096>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.000094> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000114>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.000091> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000274>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.000107>
> >
> >
> > Somewhere in there was the dd, but you can't tell.
> >
> > I've found if I mount the XFS filesystem with nobarrier, the
> > latency is reduced to about 0.5 seconds with occasional spikes > 1
> > second.
> >
> > When doing this on the raid array.
> >
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.009164>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000071>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.002667>
> >
> > dd kicks in
> >
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <11.580238>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <3.222294>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.888863>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <4.297978>
> >
> > dd finishes >
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000199>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.013413>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.025134>
> >
> >
> > I guess I should take this to the XFS folks.
>
> Try mounting the filesystem "noatime" and see if that's part of the
> problem.
Yeah, it's mounted noatime. Looks like I tracked this down to an XFS
regression.
http://marc.info/?l=linux-fsdevel&m=119211228609886&w=2
Cheers,
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue.
2007-10-11 18:07 ` Andrew Clayton
@ 2007-10-11 23:43 ` Justin Piszcz
0 siblings, 0 replies; 56+ messages in thread
From: Justin Piszcz @ 2007-10-11 23:43 UTC (permalink / raw)
To: Andrew Clayton
Cc: Bill Davidsen, John Stoffel, David Rees, Andrew Clayton,
linux-raid
On Thu, 11 Oct 2007, Andrew Clayton wrote:
> On Thu, 11 Oct 2007 13:06:39 -0400, Bill Davidsen wrote:
>
>> Andrew Clayton wrote:
>>> On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote:
>>>
>>> >> Can you start a 'vmstat 1' in one window, then start whatever
>>> >> you do
>>>> to get crappy performance. That would be interesting to see.
>>>> >
>>> In trying to find something simple that can show the problem I'm
>>> seeing. I think I may have found the culprit.
>>>
>>> Just testing on my machine at home, I made this simple program.
>>>
>>> /* fslattest.c */
>>>
>>> #define _GNU_SOURCE
>>>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <unistd.h>
>>> #include <sys/stat.h>
>>> #include <sys/types.h>
>>> #include <fcntl.h>
>>> #include <string.h>
>>>
>>>
>>> int main(int argc, char *argv[])
>>> {
>>> char file[255];
>>>
>>> if (argc < 2) {
>>> printf("Usage: fslattest file\n");
>>> exit(1);
>>> }
>>>
>>> strncpy(file, argv[1], 254);
>>> printf("Opening %s\n", file);
>>>
>>> while (1) {
>>> int testfd = open(file, >
>>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600); close(testfd);
>>> unlink(file);
>>> sleep(1);
>>> }
>>>
>>> exit(0);
>>> }
>>>
>>>
>>> If I run this program under strace in my home directory (XFS file
>>> system on a (new) disk (no raid involved) all to its own.like
>>>
>>> $ strace -T -e open ./fslattest test
>>>
>>> It doesn't looks too bad.
>>>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
>>> <0.005043> open("test",
>>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000212>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
>>> <0.016844>
>>>
>>> If I then start up a dd in the same place.
>>>
>>> $ dd if=/dev/zero of=bigfile bs=1M count=500
>>>
>>> Then I see the problem I'm seeing at work.
>>>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
>>> <2.000348> open("test",
>>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.594441>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
>>> <2.224636> open("test",
>>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.074615>
>>>
>>> Doing the same on my other disk which is Ext3 and contains the root
>>> fs, it doesn't ever stutter
>>>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
>>> <0.015423> open("test",
>>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000092>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
>>> <0.000093> open("test",
>>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000088>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
>>> <0.000103> open("test",
>>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000096>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
>>> <0.000094> open("test",
>>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000114>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
>>> <0.000091> open("test",
>>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000274>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
>>> <0.000107>
>>>
>>>
>>> Somewhere in there was the dd, but you can't tell.
>>>
>>> I've found if I mount the XFS filesystem with nobarrier, the
>>> latency is reduced to about 0.5 seconds with occasional spikes > 1
>>> second.
>>>
>>> When doing this on the raid array.
>>>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.009164>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000071>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.002667>
>>>
>>> dd kicks in
>>>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <11.580238>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <3.222294>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.888863>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <4.297978>
>>>
>>> dd finishes >
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000199>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.013413>
>>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.025134>
>>>
>>>
>>> I guess I should take this to the XFS folks.
>>
>> Try mounting the filesystem "noatime" and see if that's part of the
>> problem.
>
> Yeah, it's mounted noatime. Looks like I tracked this down to an XFS
> regression.
>
> http://marc.info/?l=linux-fsdevel&m=119211228609886&w=2
>
> Cheers,
>
> Andrew
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Nice! Thanks for reporting the final result, 1-2 weeks of
debugging/discussion, nice you found it.
Justin.
^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2007-10-11 23:43 UTC | newest]
Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-03 9:53 RAID 5 performance issue Andrew Clayton
2007-10-03 16:43 ` Justin Piszcz
2007-10-03 16:48 ` Justin Piszcz
2007-10-03 20:10 ` Andrew Clayton
2007-10-03 20:16 ` Justin Piszcz
2007-10-06 12:30 ` Justin Piszcz
2007-10-06 16:05 ` Justin Piszcz
2007-10-03 20:19 ` Andrew Clayton
2007-10-03 20:35 ` Justin Piszcz
2007-10-03 20:46 ` Andrew Clayton
2007-10-03 20:36 ` David Rees
2007-10-03 20:48 ` Andrew Clayton
2007-10-04 14:08 ` Andrew Clayton
2007-10-04 14:09 ` Justin Piszcz
2007-10-04 14:10 ` Justin Piszcz
2007-10-04 14:44 ` Andrew Clayton
2007-10-04 16:20 ` Justin Piszcz
2007-10-04 18:26 ` Andrew Clayton
2007-10-05 10:25 ` Justin Piszcz
2007-10-05 10:57 ` Andrew Clayton
2007-10-05 11:08 ` Justin Piszcz
2007-10-05 12:53 ` Andrew Clayton
2007-10-05 13:18 ` Justin Piszcz
2007-10-05 13:30 ` Andrew Clayton
2007-10-05 14:07 ` Justin Piszcz
2007-10-05 14:32 ` Andrew Clayton
2007-10-05 16:10 ` Andrew Clayton
2007-10-05 16:16 ` Justin Piszcz
2007-10-05 19:33 ` Andrew Clayton
2007-10-05 18:58 ` Richard Scobie
2007-10-05 19:02 ` Justin Piszcz
2007-10-05 19:02 ` John Stoffel
2007-10-05 19:42 ` Andrew Clayton
2007-10-05 20:56 ` John Stoffel
2007-10-07 17:22 ` Andrew Clayton
2007-10-11 17:06 ` Bill Davidsen
2007-10-11 18:07 ` Andrew Clayton
2007-10-11 23:43 ` Justin Piszcz
2007-10-04 14:36 ` Andrew Clayton
2007-10-04 14:39 ` Justin Piszcz
2007-10-04 15:03 ` Andrew Clayton
2007-10-04 16:19 ` Justin Piszcz
2007-10-04 19:01 ` Andrew Clayton
2007-10-04 16:46 ` Steve Cousins
2007-10-04 17:06 ` Steve Cousins
2007-10-04 19:06 ` Andrew Clayton
2007-10-05 10:20 ` Justin Piszcz
2007-10-04 14:39 ` Justin Piszcz
2007-10-03 17:53 ` Goswin von Brederlow
2007-10-03 20:20 ` Andrew Clayton
2007-10-03 20:48 ` Richard Scobie
2007-10-05 20:25 ` Brendan Conoboy
2007-10-06 0:38 ` Dean S. Messing
2007-10-06 8:18 ` Justin Piszcz
2007-10-08 1:40 ` Dean S. Messing
2007-10-08 8:44 ` Justin Piszcz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).