* RAID 5 performance issue.
@ 2007-10-03 9:53 Andrew Clayton
2007-10-03 16:43 ` Justin Piszcz
` (2 more replies)
0 siblings, 3 replies; 56+ messages in thread
From: Andrew Clayton @ 2007-10-03 9:53 UTC (permalink / raw)
To: linux-raid
Hi,
Hardware:
Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file system) is connected to the onboard Silicon Image 3114 controller. The other 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 card. I moved the 3 raid disks off the on board controller onto the card the other day to see if that would help, it didn't.
Software:
Fedora Core 6, 2.6.23-rc9 kernel.
Array/fs details:
Filesystems are XFS
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda2 xfs 20G 5.6G 14G 29% /
/dev/sda5 xfs 213G 3.6G 209G 2% /data
none tmpfs 1008M 0 1008M 0% /dev/shm
/dev/md0 xfs 466G 237G 229G 51% /home
/dev/md0 is currently mounted with the following options
noatime,logbufs=8,sunit=512,swidth=1024
sunit and swidth seem to be automatically set.
xfs_info shows
meta-data=/dev/md0 isize=256 agcount=16, agsize=7631168 blks
= sectsz=4096 attr=1
data = bsize=4096 blocks=122097920, imaxpct=25
= sunit=64 swidth=128 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=2
= sectsz=4096 sunit=1 blks, lazy-count=0
realtime =none extsz=524288 blocks=0, rtextents=0
The array has a 256k chunk size using left-symmetric layout.
/sys/block/md0/md/stripe_cache_size is currently at 4096 (upping this from
256, alleviates the problem at best)
I also have currently set /sys/block/sd[bcd]/queue/nr_requests to 512 (doesn't
seem to have made any difference)
Also blockdev --setra 8192 /dev/sd[bcd] also tried 16384 and 32768
IO scheduler is cfq for all devices.
This machine acts as a file server for about 11 workstations. /home (the software RAID 5) is exported over NFS where by the clients mount their home directories (using autofs).
I set it up about 3 years ago and it has been fine. However earlier this year we started noticing application stalls. e.g firefox would become unrepsonsive and the window would grey out (under Compiz), this typically lasts 2-4 seconds.
During these stalls, I see the below iostat activity (taken at 2 second intervals on the file server). High iowait, high await's. The stripe_cache_active max's out and things kind of grind to halt for a few seconds until the stripe_cache_active starts shrinking.
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 0.25 0.00 99.75
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 5.47 0.00 40.80 14.91 0.05 9.73 7.18 3.93
sdb 0.00 0.00 1.49 1.49 5.97 9.95 10.67 0.06 18.50 9.00 2.69
sdc 0.00 0.00 0.00 2.99 0.00 15.92 10.67 0.01 4.17 4.17 1.24
sdd 0.00 0.00 0.50 2.49 1.99 13.93 10.67 0.02 5.67 5.67 1.69
md0 0.00 0.00 0.00 1.99 0.00 7.96 8.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.25 0.00 5.24 1.50 0.00 93.02
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 12.50 0.00 85.75 13.72 0.12 9.60 6.28 7.85
sdb 182.50 275.00 114.00 17.50 986.00 82.00 16.24 337.03 660.64 6.06 79.70
sdc 171.00 269.50 117.00 20.00 1012.00 94.00 16.15 315.35 677.73 5.86 80.25
sdd 149.00 278.00 107.00 18.50 940.00 84.00 16.32 311.83 705.33 6.33 79.40
md0 0.00 0.00 0.00 1012.00 0.00 8090.00 15.99 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 1.50 44.61 0.00 53.88
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 1.00 0.00 4.25 8.50 0.00 0.00 0.00 0.00
sdb 168.50 64.00 129.50 58.00 1114.00 508.00 17.30 645.37 1272.90 5.34 100.05
sdc 194.00 76.50 141.50 43.00 1232.00 360.00 17.26 664.01 916.30 5.42 100.05
sdd 172.00 90.50 114.50 50.00 996.00 456.00 17.65 662.54 977.28 6.08 100.05
md0 0.00 0.00 0.50 8.00 2.00 32.00 8.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.25 0.00 1.50 48.50 0.00 49.75
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 1.50 0.00 2.50 3.33 0.00 0.33 0.33 0.05
sdb 0.00 142.50 63.50 115.50 558.00 1030.00 17.74 484.58 2229.89 5.59 100.10
sdc 0.00 113.00 63.00 114.50 534.00 994.00 17.22 507.33 2879.95 5.64 100.10
sdd 0.00 118.50 56.50 87.00 482.00 740.00 17.03 546.09 2650.33 6.98 100.10
md0 0.00 0.00 1.00 2.00 6.00 8.00 9.33 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 1.25 86.03 0.00 12.72
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.50 0.00 1.50 0.00 6.25 8.33 0.00 1.33 0.67 0.10
sdb 0.00 171.00 0.00 238.50 0.00 2164.00 18.15 320.17 3555.60 4.20 100.10
sdc 0.00 172.00 0.00 195.50 0.00 1776.00 18.17 372.72 3696.45 5.12 100.10
sdd 0.00 188.50 0.00 144.50 0.00 1318.00 18.24 528.15 3935.08 6.93 100.10
md0 0.00 0.00 0.00 1.50 0.00 6.00 8.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.75 73.50 0.00 25.75
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.50 67.16 1.49 181.59 7.96 1564.18 17.17 119.48 1818.11 4.61 84.48
sdc 0.50 70.65 1.99 177.11 9.95 1588.06 17.84 232.45 2844.31 5.56 99.60
sdd 0.00 77.11 1.49 149.75 5.97 1371.14 18.21 484.19 4728.82 6.59 99.60
md0 0.00 0.00 0.00 1.99 0.00 11.94 12.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
When stracing firefox (on the client), during its stall period I see multi-second stalls in the open,close and unlink system calls. e.g
open("/home/andrew/.mozilla/firefox/default.d9m/Cache/1A190CD5d01", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 39 <8.239256>
close(39) = 0 <1.125843>
When its behaving I get numbers more like:
open("/home/andrew/.mozilla/firefox/default.d9m/sessionstore-1.js",
O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0600) = 39 <0.008773>
close(39) = 0 <0.265877>
Not the same file but sessionstore-1.js is 56K and 1A190CD5d01 is 37K
vim also has noticeble stalls, probably when it is doing its swap file thing.
My music is stored on the server and it never seems to be affected (player
accessing the files straight over nfs).
I have put up the current kernel config at
http://digital-domain.net/kernel/sw-raid5-issue/config
and the output of mdadm -D /dev/md0 at
http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D
If anyone has any idea's I'm all ears.
Having just composed this message, I see this thread:
http://www.spinics.net/lists/raid/msg17190.html
I do remember seeing a lot of pdflush activity (using blktrace) around the
times of the stalls, but I don't seem to get the high cpu usage.
Cheers,
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread* Re: RAID 5 performance issue. 2007-10-03 9:53 RAID 5 performance issue Andrew Clayton @ 2007-10-03 16:43 ` Justin Piszcz 2007-10-03 16:48 ` Justin Piszcz 2007-10-03 20:19 ` Andrew Clayton 2007-10-03 17:53 ` Goswin von Brederlow 2007-10-05 20:25 ` Brendan Conoboy 2 siblings, 2 replies; 56+ messages in thread From: Justin Piszcz @ 2007-10-03 16:43 UTC (permalink / raw) To: Andrew Clayton; +Cc: linux-raid Have you checked fragmentation? xfs_db -c frag -f /dev/md3 What does this report? Justin. On Wed, 3 Oct 2007, Andrew Clayton wrote: > Hi, > > Hardware: > > Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file system) is connected to the onboard Silicon Image 3114 controller. The other 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 card. I moved the 3 raid disks off the on board controller onto the card the other day to see if that would help, it didn't. > > Software: > > Fedora Core 6, 2.6.23-rc9 kernel. > > Array/fs details: > > Filesystems are XFS > > Filesystem Type Size Used Avail Use% Mounted on > /dev/sda2 xfs 20G 5.6G 14G 29% / > /dev/sda5 xfs 213G 3.6G 209G 2% /data > none tmpfs 1008M 0 1008M 0% /dev/shm > /dev/md0 xfs 466G 237G 229G 51% /home > > /dev/md0 is currently mounted with the following options > > noatime,logbufs=8,sunit=512,swidth=1024 > > sunit and swidth seem to be automatically set. > > xfs_info shows > > meta-data=/dev/md0 isize=256 agcount=16, agsize=7631168 blks > = sectsz=4096 attr=1 > data = bsize=4096 blocks=122097920, imaxpct=25 > = sunit=64 swidth=128 blks, unwritten=1 > naming =version 2 bsize=4096 > log =internal bsize=4096 blocks=32768, version=2 > = sectsz=4096 sunit=1 blks, lazy-count=0 > realtime =none extsz=524288 blocks=0, rtextents=0 > > The array has a 256k chunk size using left-symmetric layout. > > /sys/block/md0/md/stripe_cache_size is currently at 4096 (upping this from > 256, alleviates the problem at best) > > I also have currently set /sys/block/sd[bcd]/queue/nr_requests to 512 (doesn't > seem to have made any difference) > > Also blockdev --setra 8192 /dev/sd[bcd] also tried 16384 and 32768 > > IO scheduler is cfq for all devices. > > > This machine acts as a file server for about 11 workstations. /home (the software RAID 5) is exported over NFS where by the clients mount their home directories (using autofs). > > I set it up about 3 years ago and it has been fine. However earlier this year we started noticing application stalls. e.g firefox would become unrepsonsive and the window would grey out (under Compiz), this typically lasts 2-4 seconds. > > During these stalls, I see the below iostat activity (taken at 2 second intervals on the file server). High iowait, high await's. The stripe_cache_active max's out and things kind of grind to halt for a few seconds until the stripe_cache_active starts shrinking. > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.00 0.00 0.00 0.25 0.00 99.75 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util > sda 0.00 0.00 0.00 5.47 0.00 40.80 14.91 0.05 9.73 7.18 3.93 > sdb 0.00 0.00 1.49 1.49 5.97 9.95 10.67 0.06 18.50 9.00 2.69 > sdc 0.00 0.00 0.00 2.99 0.00 15.92 10.67 0.01 4.17 4.17 1.24 > sdd 0.00 0.00 0.50 2.49 1.99 13.93 10.67 0.02 5.67 5.67 1.69 > md0 0.00 0.00 0.00 1.99 0.00 7.96 8.00 0.00 0.00 0.00 0.00 > sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.25 0.00 5.24 1.50 0.00 93.02 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util > sda 0.00 0.00 0.00 12.50 0.00 85.75 13.72 0.12 9.60 6.28 7.85 > sdb 182.50 275.00 114.00 17.50 986.00 82.00 16.24 337.03 660.64 6.06 79.70 > sdc 171.00 269.50 117.00 20.00 1012.00 94.00 16.15 315.35 677.73 5.86 80.25 > sdd 149.00 278.00 107.00 18.50 940.00 84.00 16.32 311.83 705.33 6.33 79.40 > md0 0.00 0.00 0.00 1012.00 0.00 8090.00 15.99 0.00 0.00 0.00 0.00 > sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.00 0.00 1.50 44.61 0.00 53.88 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util > sda 0.00 0.00 0.00 1.00 0.00 4.25 8.50 0.00 0.00 0.00 0.00 > sdb 168.50 64.00 129.50 58.00 1114.00 508.00 17.30 645.37 1272.90 5.34 100.05 > sdc 194.00 76.50 141.50 43.00 1232.00 360.00 17.26 664.01 916.30 5.42 100.05 > sdd 172.00 90.50 114.50 50.00 996.00 456.00 17.65 662.54 977.28 6.08 100.05 > md0 0.00 0.00 0.50 8.00 2.00 32.00 8.00 0.00 0.00 0.00 0.00 > sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.25 0.00 1.50 48.50 0.00 49.75 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util > sda 0.00 0.00 0.00 1.50 0.00 2.50 3.33 0.00 0.33 0.33 0.05 > sdb 0.00 142.50 63.50 115.50 558.00 1030.00 17.74 484.58 2229.89 5.59 100.10 > sdc 0.00 113.00 63.00 114.50 534.00 994.00 17.22 507.33 2879.95 5.64 100.10 > sdd 0.00 118.50 56.50 87.00 482.00 740.00 17.03 546.09 2650.33 6.98 100.10 > md0 0.00 0.00 1.00 2.00 6.00 8.00 9.33 0.00 0.00 0.00 0.00 > sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.00 0.00 1.25 86.03 0.00 12.72 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util > sda 0.00 0.50 0.00 1.50 0.00 6.25 8.33 0.00 1.33 0.67 0.10 > sdb 0.00 171.00 0.00 238.50 0.00 2164.00 18.15 320.17 3555.60 4.20 100.10 > sdc 0.00 172.00 0.00 195.50 0.00 1776.00 18.17 372.72 3696.45 5.12 100.10 > sdd 0.00 188.50 0.00 144.50 0.00 1318.00 18.24 528.15 3935.08 6.93 100.10 > md0 0.00 0.00 0.00 1.50 0.00 6.00 8.00 0.00 0.00 0.00 0.00 > sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.00 0.00 0.75 73.50 0.00 25.75 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util > sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > sdb 0.50 67.16 1.49 181.59 7.96 1564.18 17.17 119.48 1818.11 4.61 84.48 > sdc 0.50 70.65 1.99 177.11 9.95 1588.06 17.84 232.45 2844.31 5.56 99.60 > sdd 0.00 77.11 1.49 149.75 5.97 1371.14 18.21 484.19 4728.82 6.59 99.60 > md0 0.00 0.00 0.00 1.99 0.00 11.94 12.00 0.00 0.00 0.00 0.00 > sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > > When stracing firefox (on the client), during its stall period I see multi-second stalls in the open,close and unlink system calls. e.g > > open("/home/andrew/.mozilla/firefox/default.d9m/Cache/1A190CD5d01", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 39 <8.239256> > close(39) = 0 <1.125843> > > When its behaving I get numbers more like: > > open("/home/andrew/.mozilla/firefox/default.d9m/sessionstore-1.js", > O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0600) = 39 <0.008773> > close(39) = 0 <0.265877> > > Not the same file but sessionstore-1.js is 56K and 1A190CD5d01 is 37K > > vim also has noticeble stalls, probably when it is doing its swap file thing. > > My music is stored on the server and it never seems to be affected (player > accessing the files straight over nfs). > > > I have put up the current kernel config at > > http://digital-domain.net/kernel/sw-raid5-issue/config > > and the output of mdadm -D /dev/md0 at > > http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D > > > If anyone has any idea's I'm all ears. > > > Having just composed this message, I see this thread: > http://www.spinics.net/lists/raid/msg17190.html > > I do remember seeing a lot of pdflush activity (using blktrace) around the > times of the stalls, but I don't seem to get the high cpu usage. > > > Cheers, > > Andrew > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-03 16:43 ` Justin Piszcz @ 2007-10-03 16:48 ` Justin Piszcz 2007-10-03 20:10 ` Andrew Clayton 2007-10-03 20:19 ` Andrew Clayton 1 sibling, 1 reply; 56+ messages in thread From: Justin Piszcz @ 2007-10-03 16:48 UTC (permalink / raw) To: Andrew Clayton; +Cc: linux-raid Also if it is software raid, when you make the XFS filesyste, on it, it sets up a proper (and tuned) sunit/swidth, so why would you want to change that? Justin. On Wed, 3 Oct 2007, Justin Piszcz wrote: > Have you checked fragmentation? > > xfs_db -c frag -f /dev/md3 > > What does this report? > > Justin. > > On Wed, 3 Oct 2007, Andrew Clayton wrote: > >> Hi, >> >> Hardware: >> >> Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file >> system) is connected to the onboard Silicon Image 3114 controller. The >> other 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image >> 3124 card. I moved the 3 raid disks off the on board controller onto the >> card the other day to see if that would help, it didn't. >> >> Software: >> >> Fedora Core 6, 2.6.23-rc9 kernel. >> >> Array/fs details: >> >> Filesystems are XFS >> >> Filesystem Type Size Used Avail Use% Mounted on >> /dev/sda2 xfs 20G 5.6G 14G 29% / >> /dev/sda5 xfs 213G 3.6G 209G 2% /data >> none tmpfs 1008M 0 1008M 0% /dev/shm >> /dev/md0 xfs 466G 237G 229G 51% /home >> >> /dev/md0 is currently mounted with the following options >> >> noatime,logbufs=8,sunit=512,swidth=1024 >> >> sunit and swidth seem to be automatically set. >> >> xfs_info shows >> >> meta-data=/dev/md0 isize=256 agcount=16, agsize=7631168 >> blks >> = sectsz=4096 attr=1 >> data = bsize=4096 blocks=122097920, imaxpct=25 >> = sunit=64 swidth=128 blks, unwritten=1 >> naming =version 2 bsize=4096 >> log =internal bsize=4096 blocks=32768, version=2 >> = sectsz=4096 sunit=1 blks, lazy-count=0 >> realtime =none extsz=524288 blocks=0, rtextents=0 >> >> The array has a 256k chunk size using left-symmetric layout. >> >> /sys/block/md0/md/stripe_cache_size is currently at 4096 (upping this from >> 256, alleviates the problem at best) >> >> I also have currently set /sys/block/sd[bcd]/queue/nr_requests to 512 >> (doesn't >> seem to have made any difference) >> >> Also blockdev --setra 8192 /dev/sd[bcd] also tried 16384 and 32768 >> >> IO scheduler is cfq for all devices. >> >> >> This machine acts as a file server for about 11 workstations. /home (the >> software RAID 5) is exported over NFS where by the clients mount their home >> directories (using autofs). >> >> I set it up about 3 years ago and it has been fine. However earlier this >> year we started noticing application stalls. e.g firefox would become >> unrepsonsive and the window would grey out (under Compiz), this typically >> lasts 2-4 seconds. >> >> During these stalls, I see the below iostat activity (taken at 2 second >> intervals on the file server). High iowait, high await's. The >> stripe_cache_active max's out and things kind of grind to halt for a few >> seconds until the stripe_cache_active starts shrinking. >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.00 0.00 0.00 0.25 0.00 99.75 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz >> avgqu-sz await svctm %util >> sda 0.00 0.00 0.00 5.47 0.00 40.80 14.91 >> 0.05 9.73 7.18 3.93 >> sdb 0.00 0.00 1.49 1.49 5.97 9.95 10.67 >> 0.06 18.50 9.00 2.69 >> sdc 0.00 0.00 0.00 2.99 0.00 15.92 10.67 >> 0.01 4.17 4.17 1.24 >> sdd 0.00 0.00 0.50 2.49 1.99 13.93 10.67 >> 0.02 5.67 5.67 1.69 >> md0 0.00 0.00 0.00 1.99 0.00 7.96 8.00 >> 0.00 0.00 0.00 0.00 >> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.25 0.00 5.24 1.50 0.00 93.02 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz >> avgqu-sz await svctm %util >> sda 0.00 0.00 0.00 12.50 0.00 85.75 13.72 >> 0.12 9.60 6.28 7.85 >> sdb 182.50 275.00 114.00 17.50 986.00 82.00 16.24 >> 337.03 660.64 6.06 79.70 >> sdc 171.00 269.50 117.00 20.00 1012.00 94.00 16.15 >> 315.35 677.73 5.86 80.25 >> sdd 149.00 278.00 107.00 18.50 940.00 84.00 16.32 >> 311.83 705.33 6.33 79.40 >> md0 0.00 0.00 0.00 1012.00 0.00 8090.00 15.99 >> 0.00 0.00 0.00 0.00 >> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.00 0.00 1.50 44.61 0.00 53.88 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz >> avgqu-sz await svctm %util >> sda 0.00 0.00 0.00 1.00 0.00 4.25 8.50 >> 0.00 0.00 0.00 0.00 >> sdb 168.50 64.00 129.50 58.00 1114.00 508.00 17.30 >> 645.37 1272.90 5.34 100.05 >> sdc 194.00 76.50 141.50 43.00 1232.00 360.00 17.26 >> 664.01 916.30 5.42 100.05 >> sdd 172.00 90.50 114.50 50.00 996.00 456.00 17.65 >> 662.54 977.28 6.08 100.05 >> md0 0.00 0.00 0.50 8.00 2.00 32.00 8.00 >> 0.00 0.00 0.00 0.00 >> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.25 0.00 1.50 48.50 0.00 49.75 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz >> avgqu-sz await svctm %util >> sda 0.00 0.00 0.00 1.50 0.00 2.50 3.33 >> 0.00 0.33 0.33 0.05 >> sdb 0.00 142.50 63.50 115.50 558.00 1030.00 17.74 >> 484.58 2229.89 5.59 100.10 >> sdc 0.00 113.00 63.00 114.50 534.00 994.00 17.22 >> 507.33 2879.95 5.64 100.10 >> sdd 0.00 118.50 56.50 87.00 482.00 740.00 17.03 >> 546.09 2650.33 6.98 100.10 >> md0 0.00 0.00 1.00 2.00 6.00 8.00 9.33 >> 0.00 0.00 0.00 0.00 >> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.00 0.00 1.25 86.03 0.00 12.72 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz >> avgqu-sz await svctm %util >> sda 0.00 0.50 0.00 1.50 0.00 6.25 8.33 >> 0.00 1.33 0.67 0.10 >> sdb 0.00 171.00 0.00 238.50 0.00 2164.00 18.15 >> 320.17 3555.60 4.20 100.10 >> sdc 0.00 172.00 0.00 195.50 0.00 1776.00 18.17 >> 372.72 3696.45 5.12 100.10 >> sdd 0.00 188.50 0.00 144.50 0.00 1318.00 18.24 >> 528.15 3935.08 6.93 100.10 >> md0 0.00 0.00 0.00 1.50 0.00 6.00 8.00 >> 0.00 0.00 0.00 0.00 >> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.00 0.00 0.75 73.50 0.00 25.75 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz >> avgqu-sz await svctm %util >> sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> sdb 0.50 67.16 1.49 181.59 7.96 1564.18 17.17 >> 119.48 1818.11 4.61 84.48 >> sdc 0.50 70.65 1.99 177.11 9.95 1588.06 17.84 >> 232.45 2844.31 5.56 99.60 >> sdd 0.00 77.11 1.49 149.75 5.97 1371.14 18.21 >> 484.19 4728.82 6.59 99.60 >> md0 0.00 0.00 0.00 1.99 0.00 11.94 12.00 >> 0.00 0.00 0.00 0.00 >> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> >> >> When stracing firefox (on the client), during its stall period I see >> multi-second stalls in the open,close and unlink system calls. e.g >> >> open("/home/andrew/.mozilla/firefox/default.d9m/Cache/1A190CD5d01", >> O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 39 <8.239256> >> close(39) = 0 <1.125843> >> >> When its behaving I get numbers more like: >> >> open("/home/andrew/.mozilla/firefox/default.d9m/sessionstore-1.js", >> O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0600) = 39 <0.008773> >> close(39) = 0 <0.265877> >> >> Not the same file but sessionstore-1.js is 56K and 1A190CD5d01 is 37K >> >> vim also has noticeble stalls, probably when it is doing its swap file >> thing. >> >> My music is stored on the server and it never seems to be affected (player >> accessing the files straight over nfs). >> >> >> I have put up the current kernel config at >> >> http://digital-domain.net/kernel/sw-raid5-issue/config >> >> and the output of mdadm -D /dev/md0 at >> >> http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D >> >> >> If anyone has any idea's I'm all ears. >> >> >> Having just composed this message, I see this thread: >> http://www.spinics.net/lists/raid/msg17190.html >> >> I do remember seeing a lot of pdflush activity (using blktrace) around the >> times of the stalls, but I don't seem to get the high cpu usage. >> >> >> Cheers, >> >> Andrew >> >> - >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-03 16:48 ` Justin Piszcz @ 2007-10-03 20:10 ` Andrew Clayton 2007-10-03 20:16 ` Justin Piszcz 2007-10-06 12:30 ` Justin Piszcz 0 siblings, 2 replies; 56+ messages in thread From: Andrew Clayton @ 2007-10-03 20:10 UTC (permalink / raw) To: Justin Piszcz; +Cc: Andrew Clayton, linux-raid On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote: > Also if it is software raid, when you make the XFS filesyste, on it, > it sets up a proper (and tuned) sunit/swidth, so why would you want > to change that? Oh I didn't, the sunit and swidth were set automatically. Do they look sane?. From reading the XFS section of the mount man page, I'm not entirely sure what they specify and certainly wouldn't have any idea what to set them to. > Justin. Cheers, Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-03 20:10 ` Andrew Clayton @ 2007-10-03 20:16 ` Justin Piszcz 2007-10-06 12:30 ` Justin Piszcz 1 sibling, 0 replies; 56+ messages in thread From: Justin Piszcz @ 2007-10-03 20:16 UTC (permalink / raw) To: Andrew Clayton; +Cc: Andrew Clayton, linux-raid On Wed, 3 Oct 2007, Andrew Clayton wrote: > On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote: > >> Also if it is software raid, when you make the XFS filesyste, on it, >> it sets up a proper (and tuned) sunit/swidth, so why would you want >> to change that? > > Oh I didn't, the sunit and swidth were set automatically. Do they look > sane?. From reading the XFS section of the mount man page, I'm not > entirely sure what they specify and certainly wouldn't have any idea > what to set them to. > >> Justin. > > Cheers, > > Andrew > You should not need to set them as mount options unless you are overriding the defaults. Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-03 20:10 ` Andrew Clayton 2007-10-03 20:16 ` Justin Piszcz @ 2007-10-06 12:30 ` Justin Piszcz 2007-10-06 16:05 ` Justin Piszcz 1 sibling, 1 reply; 56+ messages in thread From: Justin Piszcz @ 2007-10-06 12:30 UTC (permalink / raw) To: Andrew Clayton; +Cc: Andrew Clayton, linux-raid On Wed, 3 Oct 2007, Andrew Clayton wrote: > On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote: > >> Also if it is software raid, when you make the XFS filesyste, on it, >> it sets up a proper (and tuned) sunit/swidth, so why would you want >> to change that? > > Oh I didn't, the sunit and swidth were set automatically. Do they look > sane?. From reading the XFS section of the mount man page, I'm not > entirely sure what they specify and certainly wouldn't have any idea > what to set them to. > >> Justin. > > Cheers, > > Andrew > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > As long as you ran mkfs.xfs /dev/md0 it should have optimized the filesystem according to the disks beneath it. Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-06 12:30 ` Justin Piszcz @ 2007-10-06 16:05 ` Justin Piszcz 0 siblings, 0 replies; 56+ messages in thread From: Justin Piszcz @ 2007-10-06 16:05 UTC (permalink / raw) To: Andrew Clayton; +Cc: Andrew Clayton, linux-raid On Sat, 6 Oct 2007, Justin Piszcz wrote: > > > On Wed, 3 Oct 2007, Andrew Clayton wrote: > >> On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote: >> >>> Also if it is software raid, when you make the XFS filesyste, on it, >>> it sets up a proper (and tuned) sunit/swidth, so why would you want >>> to change that? >> >> Oh I didn't, the sunit and swidth were set automatically. Do they look >> sane?. From reading the XFS section of the mount man page, I'm not >> entirely sure what they specify and certainly wouldn't have any idea >> what to set them to. >> >>> Justin. >> >> Cheers, >> >> Andrew >> - >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > As long as you ran mkfs.xfs /dev/md0 it should have optimized the filesystem > according to the disks beneath it. > > Justin. > Also can you provide the smartctl -a /dev/sda /dev/sdb etc for each disk? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-03 16:43 ` Justin Piszcz 2007-10-03 16:48 ` Justin Piszcz @ 2007-10-03 20:19 ` Andrew Clayton 2007-10-03 20:35 ` Justin Piszcz 2007-10-03 20:36 ` David Rees 1 sibling, 2 replies; 56+ messages in thread From: Andrew Clayton @ 2007-10-03 20:19 UTC (permalink / raw) To: Justin Piszcz; +Cc: Andrew Clayton, linux-raid On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote: > Have you checked fragmentation? You know, that never even occurred to me. I've gotten into the mind set that it's generally not a problem under Linux. > xfs_db -c frag -f /dev/md3 > > What does this report? # xfs_db -c frag -f /dev/md0 actual 1828276, ideal 1708782, fragmentation factor 6.54% Good or bad? Seeing as this filesystem will be three years old in December, that doesn't seem overly bad. I'm currently looking to things like http://lwn.net/Articles/249450/ and http://lwn.net/Articles/242559/ for potential help, fortunately it seems I won't have too long to wait. > Justin. Cheers, Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-03 20:19 ` Andrew Clayton @ 2007-10-03 20:35 ` Justin Piszcz 2007-10-03 20:46 ` Andrew Clayton 2007-10-03 20:36 ` David Rees 1 sibling, 1 reply; 56+ messages in thread From: Justin Piszcz @ 2007-10-03 20:35 UTC (permalink / raw) To: Andrew Clayton; +Cc: Andrew Clayton, linux-raid What does cat /sys/block/md0/md/mismatch_cnt say? That fragmentation looks normal/fine. Justin. On Wed, 3 Oct 2007, Andrew Clayton wrote: > On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote: > >> Have you checked fragmentation? > > You know, that never even occurred to me. I've gotten into the mind set > that it's generally not a problem under Linux. > >> xfs_db -c frag -f /dev/md3 >> >> What does this report? > > # xfs_db -c frag -f /dev/md0 > actual 1828276, ideal 1708782, fragmentation factor 6.54% > > Good or bad? > > Seeing as this filesystem will be three years old in December, that > doesn't seem overly bad. > > > I'm currently looking to things like > > http://lwn.net/Articles/249450/ and > http://lwn.net/Articles/242559/ > > for potential help, fortunately it seems I won't have too long to wait. > >> Justin. > > Cheers, > > Andrew > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-03 20:35 ` Justin Piszcz @ 2007-10-03 20:46 ` Andrew Clayton 0 siblings, 0 replies; 56+ messages in thread From: Andrew Clayton @ 2007-10-03 20:46 UTC (permalink / raw) To: Justin Piszcz; +Cc: Andrew Clayton, linux-raid On Wed, 3 Oct 2007 16:35:21 -0400 (EDT), Justin Piszcz wrote: > What does cat /sys/block/md0/md/mismatch_cnt say? $ cat /sys/block/md0/md/mismatch_cnt 0 > That fragmentation looks normal/fine. Cool. > Justin. Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-03 20:19 ` Andrew Clayton 2007-10-03 20:35 ` Justin Piszcz @ 2007-10-03 20:36 ` David Rees 2007-10-03 20:48 ` Andrew Clayton 2007-10-04 14:08 ` Andrew Clayton 1 sibling, 2 replies; 56+ messages in thread From: David Rees @ 2007-10-03 20:36 UTC (permalink / raw) To: Andrew Clayton; +Cc: Justin Piszcz, Andrew Clayton, linux-raid On 10/3/07, Andrew Clayton <andrew@digital-domain.net> wrote: > On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote: > > Have you checked fragmentation? > > You know, that never even occurred to me. I've gotten into the mind set > that it's generally not a problem under Linux. It's probably not the root cause, but certainly doesn't help things. At least with XFS you have an easy way to defrag the filesystem without even taking it offline. > # xfs_db -c frag -f /dev/md0 > actual 1828276, ideal 1708782, fragmentation factor 6.54% > > Good or bad? Not bad, but not that good, either. Try running xfs_fsr into a nightly cronjob. By default, it will defrag mounted xfs filesystems for up to 2 hours. Typically this is enough to keep fragmentation well below 1%. -Dave ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-03 20:36 ` David Rees @ 2007-10-03 20:48 ` Andrew Clayton 2007-10-04 14:08 ` Andrew Clayton 1 sibling, 0 replies; 56+ messages in thread From: Andrew Clayton @ 2007-10-03 20:48 UTC (permalink / raw) To: David Rees; +Cc: Justin Piszcz, Andrew Clayton, linux-raid On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote: > > # xfs_db -c frag -f /dev/md0 > > actual 1828276, ideal 1708782, fragmentation factor 6.54% > > > > Good or bad? > > Not bad, but not that good, either. Try running xfs_fsr into a nightly > cronjob. By default, it will defrag mounted xfs filesystems for up to > 2 hours. Typically this is enough to keep fragmentation well below 1%. Worth a shot. > -Dave Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-03 20:36 ` David Rees 2007-10-03 20:48 ` Andrew Clayton @ 2007-10-04 14:08 ` Andrew Clayton 2007-10-04 14:09 ` Justin Piszcz 1 sibling, 1 reply; 56+ messages in thread From: Andrew Clayton @ 2007-10-04 14:08 UTC (permalink / raw) To: David Rees; +Cc: Justin Piszcz, Andrew Clayton, linux-raid On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote: > Not bad, but not that good, either. Try running xfs_fsr into a nightly > cronjob. By default, it will defrag mounted xfs filesystems for up to > 2 hours. Typically this is enough to keep fragmentation well below 1%. I ran it last night on the raid array, it got the fragmentation down to 1.07%. Unfortunately that doesn't seemed to have helped. > -Dave Cheers, Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 14:08 ` Andrew Clayton @ 2007-10-04 14:09 ` Justin Piszcz 2007-10-04 14:10 ` Justin Piszcz 2007-10-04 14:36 ` Andrew Clayton 0 siblings, 2 replies; 56+ messages in thread From: Justin Piszcz @ 2007-10-04 14:09 UTC (permalink / raw) To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid Is NCQ enabled on the drives? On Thu, 4 Oct 2007, Andrew Clayton wrote: > On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote: > >> Not bad, but not that good, either. Try running xfs_fsr into a nightly >> cronjob. By default, it will defrag mounted xfs filesystems for up to >> 2 hours. Typically this is enough to keep fragmentation well below 1%. > > I ran it last night on the raid array, it got the fragmentation down > to 1.07%. Unfortunately that doesn't seemed to have helped. > >> -Dave > > Cheers, > > Andrew > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 14:09 ` Justin Piszcz @ 2007-10-04 14:10 ` Justin Piszcz 2007-10-04 14:44 ` Andrew Clayton 2007-10-04 14:36 ` Andrew Clayton 1 sibling, 1 reply; 56+ messages in thread From: Justin Piszcz @ 2007-10-04 14:10 UTC (permalink / raw) To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid On Thu, 4 Oct 2007, Justin Piszcz wrote: > Is NCQ enabled on the drives? > > On Thu, 4 Oct 2007, Andrew Clayton wrote: > >> On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote: >> >>> Not bad, but not that good, either. Try running xfs_fsr into a nightly >>> cronjob. By default, it will defrag mounted xfs filesystems for up to >>> 2 hours. Typically this is enough to keep fragmentation well below 1%. >> >> I ran it last night on the raid array, it got the fragmentation down >> to 1.07%. Unfortunately that doesn't seemed to have helped. >> >>> -Dave >> >> Cheers, >> >> Andrew >> > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Also, did performance just go to crap one day or was it gradual? Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 14:10 ` Justin Piszcz @ 2007-10-04 14:44 ` Andrew Clayton 2007-10-04 16:20 ` Justin Piszcz 2007-10-05 19:02 ` John Stoffel 0 siblings, 2 replies; 56+ messages in thread From: Andrew Clayton @ 2007-10-04 14:44 UTC (permalink / raw) To: Justin Piszcz; +Cc: David Rees, Andrew Clayton, linux-raid On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote: > Also, did performance just go to crap one day or was it gradual? IIRC I just noticed one day that firefox and vim was stalling. That was back in February/March I think. At the time the server was running a 2.6.18 kernel, since then I've tried a few kernels in between that and currently 2.6.23-rc9 Something seems to be periodically causing a lot of activity that max's out the stripe_cache for a few seconds (when I was trying to look with blktrace, it seemed pdflush was doing a lot of activity during this time). What I had noticed just recently was when I was the only one doing IO on the server (no NFS running and I was logged in at the console) even just patching the kernel was crawling to a halt. > Justin. Cheers, Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 14:44 ` Andrew Clayton @ 2007-10-04 16:20 ` Justin Piszcz 2007-10-04 18:26 ` Andrew Clayton 2007-10-05 19:02 ` John Stoffel 1 sibling, 1 reply; 56+ messages in thread From: Justin Piszcz @ 2007-10-04 16:20 UTC (permalink / raw) To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid On Thu, 4 Oct 2007, Andrew Clayton wrote: > On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote: > > >> Also, did performance just go to crap one day or was it gradual? > > IIRC I just noticed one day that firefox and vim was stalling. That was > back in February/March I think. At the time the server was running a > 2.6.18 kernel, since then I've tried a few kernels in between that and > currently 2.6.23-rc9 > > Something seems to be periodically causing a lot of activity that > max's out the stripe_cache for a few seconds (when I was trying > to look with blktrace, it seemed pdflush was doing a lot of activity > during this time). > > What I had noticed just recently was when I was the only one doing IO > on the server (no NFS running and I was logged in at the console) even > just patching the kernel was crawling to a halt. > >> Justin. > > Cheers, > > Andrew > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Besides the NCQ issue your problem is a bit perpelxing.. Just out of curiosity have you run memtest86 for at least one pass to make sure there were no problems with the memory? Do you have a script showing all of the parameters that you use to optimize the array? Also mdadm -D /dev/md0 output please? What distribution are you running? (not that it should matter, but just curious) Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 16:20 ` Justin Piszcz @ 2007-10-04 18:26 ` Andrew Clayton 2007-10-05 10:25 ` Justin Piszcz 0 siblings, 1 reply; 56+ messages in thread From: Andrew Clayton @ 2007-10-04 18:26 UTC (permalink / raw) To: Justin Piszcz; +Cc: David Rees, Andrew Clayton, linux-raid On Thu, 4 Oct 2007 12:20:25 -0400 (EDT), Justin Piszcz wrote: > > > On Thu, 4 Oct 2007, Andrew Clayton wrote: > > > On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote: > > > > > >> Also, did performance just go to crap one day or was it gradual? > > > > IIRC I just noticed one day that firefox and vim was stalling. That > > was back in February/March I think. At the time the server was > > running a 2.6.18 kernel, since then I've tried a few kernels in > > between that and currently 2.6.23-rc9 > > > > Something seems to be periodically causing a lot of activity that > > max's out the stripe_cache for a few seconds (when I was trying > > to look with blktrace, it seemed pdflush was doing a lot of activity > > during this time). > > > > What I had noticed just recently was when I was the only one doing > > IO on the server (no NFS running and I was logged in at the > > console) even just patching the kernel was crawling to a halt. > > > >> Justin. > > > > Cheers, > > > > Andrew > > - > > To unsubscribe from this list: send the line "unsubscribe > > linux-raid" in the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > Besides the NCQ issue your problem is a bit perpelxing.. > > Just out of curiosity have you run memtest86 for at least one pass to > make sure there were no problems with the memory? No I haven't. > Do you have a script showing all of the parameters that you use to > optimize the array? No script, Nothing that I change really seems to make any difference. Currently I have set /sys/block/md0/md/stripe_cache_size set at 16384 It doesn't really seem to matter what I set it to, as the stripe_cache_active will periodically reach that value and take a few seconds to come back down. /sys/block/sd[bcd]/queue/nr_requests to 512 and set readhead to 8192 on sd[bcd] But none of that really seems to make any difference. > Also mdadm -D /dev/md0 output please? http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D > What distribution are you running? (not that it should matter, but > just curious) Fedora Core 6 (though I'm fairly sure it was happening before upgrading from Fedora Core 5) The iostat output of the drives when the problem occurs looks like the same profile as when the backup is going onto the USB 1.1 hard drive. The IO wait goes up, the cpu % is hitting 100% and we see multi second await times. Which is why I thought maybe the on board controller was a bottleneck, like the USB 1.1 is really slow and moved the disks onto the PCI card. But when I saw that even patching the kernel was going really slow I thought it can't really be the problem as it didn't used to go that slow. It's a tricky one... > Justin. Cheers, Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 18:26 ` Andrew Clayton @ 2007-10-05 10:25 ` Justin Piszcz 2007-10-05 10:57 ` Andrew Clayton 0 siblings, 1 reply; 56+ messages in thread From: Justin Piszcz @ 2007-10-05 10:25 UTC (permalink / raw) To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid On Thu, 4 Oct 2007, Andrew Clayton wrote: > On Thu, 4 Oct 2007 12:20:25 -0400 (EDT), Justin Piszcz wrote: > >> >> >> On Thu, 4 Oct 2007, Andrew Clayton wrote: >> >>> On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote: >>> >>> >>>> Also, did performance just go to crap one day or was it gradual? >>> >>> IIRC I just noticed one day that firefox and vim was stalling. That >>> was back in February/March I think. At the time the server was >>> running a 2.6.18 kernel, since then I've tried a few kernels in >>> between that and currently 2.6.23-rc9 >>> >>> Something seems to be periodically causing a lot of activity that >>> max's out the stripe_cache for a few seconds (when I was trying >>> to look with blktrace, it seemed pdflush was doing a lot of activity >>> during this time). >>> >>> What I had noticed just recently was when I was the only one doing >>> IO on the server (no NFS running and I was logged in at the >>> console) even just patching the kernel was crawling to a halt. >>> >>>> Justin. >>> >>> Cheers, >>> >>> Andrew >>> - >>> To unsubscribe from this list: send the line "unsubscribe >>> linux-raid" in the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> Besides the NCQ issue your problem is a bit perpelxing.. >> >> Just out of curiosity have you run memtest86 for at least one pass to >> make sure there were no problems with the memory? > > No I haven't. > >> Do you have a script showing all of the parameters that you use to >> optimize the array? > > No script, Nothing that I change really seems to make any difference. > > Currently I have set > > /sys/block/md0/md/stripe_cache_size set at 16384 > > It doesn't really seem to matter what I set it to, as the > stripe_cache_active will periodically reach that value and take a few > seconds to come back down. > > /sys/block/sd[bcd]/queue/nr_requests to 512 > > and set readhead to 8192 on sd[bcd] > > But none of that really seems to make any difference. > >> Also mdadm -D /dev/md0 output please? > > http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D > >> What distribution are you running? (not that it should matter, but >> just curious) > > Fedora Core 6 (though I'm fairly sure it was happening before > upgrading from Fedora Core 5) > > The iostat output of the drives when the problem occurs looks like the > same profile as when the backup is going onto the USB 1.1 hard drive. > The IO wait goes up, the cpu % is hitting 100% and we see multi second > await times. Which is why I thought maybe the on board controller was a > bottleneck, like the USB 1.1 is really slow and moved the disks onto > the PCI card. But when I saw that even patching the kernel was going > really slow I thought it can't really be the problem as it didn't used > to go that slow. > > It's a tricky one... > >> Justin. > > Cheers, > > Andrew > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > So you have 3 SATA 1 disks: http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D Do you compile your own kernel or use the distribution's kernel? What does cat /proc/interrupts say? This is important to see if your disk controller(s) are sharing IRQs with other devices. Also note with only 3 disks in a RAID-5 you will not get stellar performance, but regardless, it should not be 'hanging' as you have mentioned. Just out of sheer curiosity have you tried the AS scheduler? CFQ is supposed to be better for multi-user performance but I would be highly interested if you used the AS scheduler-- would that change the 'hanging' problem you are noticing? I would give it a shot, also try the deadline and noop. You probably want to keep the nr_requessts to 128, the stripe_cache_size to 8mb. The stripe size of 256k is probably optimal. Did you also re-mount the XFS partition with the default mount options (or just take the sunit and swidth)? Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 10:25 ` Justin Piszcz @ 2007-10-05 10:57 ` Andrew Clayton 2007-10-05 11:08 ` Justin Piszcz 0 siblings, 1 reply; 56+ messages in thread From: Andrew Clayton @ 2007-10-05 10:57 UTC (permalink / raw) To: Justin Piszcz; +Cc: David Rees, Andrew Clayton, linux-raid On Fri, 5 Oct 2007 06:25:20 -0400 (EDT), Justin Piszcz wrote: > So you have 3 SATA 1 disks: Yeah, 3 of them in the array, there is a fourth standalone disk which contains the root fs from which the system boots.. > http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D > > Do you compile your own kernel or use the distribution's kernel? Compile my own. > What does cat /proc/interrupts say? This is important to see if your > disk controller(s) are sharing IRQs with other devices. $ cat /proc/interrupts CPU0 CPU1 0: 132052 249369403 IO-APIC-edge timer 1: 202 52 IO-APIC-edge i8042 8: 0 1 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 14: 11483 172 IO-APIC-edge ide0 16: 18041195 4798850 IO-APIC-fasteoi sata_sil24 18: 86068930 27 IO-APIC-fasteoi eth0 19: 16127662 2138177 IO-APIC-fasteoi sata_sil, ohci_hcd:usb1, ohci_hcd:usb2 NMI: 0 0 LOC: 249368914 249368949 ERR: 0 sata_sil24 contains the raid array, sata_sil the root fs disk > > Also note with only 3 disks in a RAID-5 you will not get stellar > performance, but regardless, it should not be 'hanging' as you have > mentioned. Just out of sheer curiosity have you tried the AS > scheduler? CFQ is supposed to be better for multi-user performance > but I would be highly interested if you used the AS scheduler-- would > that change the 'hanging' problem you are noticing? I would give it > a shot, also try the deadline and noop. I did try them briefly. I'll have another go. > You probably want to keep the nr_requessts to 128, the > stripe_cache_size to 8mb. The stripe size of 256k is probably > optimal. OK. > Did you also re-mount the XFS partition with the default mount > options (or just take the sunit and swidth)? The /etc/fstab entry for the raid array is currently: /dev/md0 /home xfs noatime,logbufs=8 1 2 and mount says /dev/md0 on /home type xfs (rw,noatime,logbufs=8) and /proc/mounts /dev/md0 /home xfs rw,noatime,logbufs=8,sunit=512,swidth=1024 0 0 So I guess mount or the kernel is setting the sunit and swidth values. > Justin. Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 10:57 ` Andrew Clayton @ 2007-10-05 11:08 ` Justin Piszcz 2007-10-05 12:53 ` Andrew Clayton 0 siblings, 1 reply; 56+ messages in thread From: Justin Piszcz @ 2007-10-05 11:08 UTC (permalink / raw) To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid On Fri, 5 Oct 2007, Andrew Clayton wrote: > On Fri, 5 Oct 2007 06:25:20 -0400 (EDT), Justin Piszcz wrote: > > >> So you have 3 SATA 1 disks: > > Yeah, 3 of them in the array, there is a fourth standalone disk which > contains the root fs from which the system boots.. > >> http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D >> >> Do you compile your own kernel or use the distribution's kernel? > > Compile my own. > >> What does cat /proc/interrupts say? This is important to see if your >> disk controller(s) are sharing IRQs with other devices. > > $ cat /proc/interrupts > CPU0 CPU1 > 0: 132052 249369403 IO-APIC-edge timer > 1: 202 52 IO-APIC-edge i8042 > 8: 0 1 IO-APIC-edge rtc > 9: 0 0 IO-APIC-fasteoi acpi > 14: 11483 172 IO-APIC-edge ide0 > 16: 18041195 4798850 IO-APIC-fasteoi sata_sil24 > 18: 86068930 27 IO-APIC-fasteoi eth0 > 19: 16127662 2138177 IO-APIC-fasteoi sata_sil, ohci_hcd:usb1, ohci_hcd:usb2 > NMI: 0 0 > LOC: 249368914 249368949 > ERR: 0 > > > sata_sil24 contains the raid array, sata_sil the root fs disk > >> >> Also note with only 3 disks in a RAID-5 you will not get stellar >> performance, but regardless, it should not be 'hanging' as you have >> mentioned. Just out of sheer curiosity have you tried the AS >> scheduler? CFQ is supposed to be better for multi-user performance >> but I would be highly interested if you used the AS scheduler-- would >> that change the 'hanging' problem you are noticing? I would give it >> a shot, also try the deadline and noop. > > I did try them briefly. I'll have another go. > >> You probably want to keep the nr_requessts to 128, the >> stripe_cache_size to 8mb. The stripe size of 256k is probably >> optimal. > > OK. > >> Did you also re-mount the XFS partition with the default mount >> options (or just take the sunit and swidth)? > > The /etc/fstab entry for the raid array is currently: > > /dev/md0 /home xfs > noatime,logbufs=8 1 2 > > and mount says > > /dev/md0 on /home type xfs (rw,noatime,logbufs=8) > > and /proc/mounts > > /dev/md0 /home xfs rw,noatime,logbufs=8,sunit=512,swidth=1024 0 0 > > So I guess mount or the kernel is setting the sunit and swidth values. > >> Justin. > > > Andrew > The mount options are from when the filesystem was made for sunit/swidth I believe. -N Causes the file system parameters to be printed out without really creating the file system. You should be able to run mkfs.xfs -N /dev/md0 to get that information. /dev/md3 /r1 xfs noatime,nodiratime,logbufs=8,logbsize=262144 0 1 Try using the following options and the AS scheduler and let me know if you still notice any 'hangs' Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 11:08 ` Justin Piszcz @ 2007-10-05 12:53 ` Andrew Clayton 2007-10-05 13:18 ` Justin Piszcz 2007-10-05 13:30 ` Andrew Clayton 0 siblings, 2 replies; 56+ messages in thread From: Andrew Clayton @ 2007-10-05 12:53 UTC (permalink / raw) To: Justin Piszcz; +Cc: David Rees, Andrew Clayton, linux-raid On Fri, 5 Oct 2007 07:08:51 -0400 (EDT), Justin Piszcz wrote: > The mount options are from when the filesystem was made for > sunit/swidth I believe. > > -N Causes the file system parameters to be printed > out without really creating the file system. > > You should be able to run mkfs.xfs -N /dev/md0 to get that > information. Can't do it while it's mounted. would xfs_info show the same stuff? > /dev/md3 /r1 xfs > noatime,nodiratime,logbufs=8,logbsize=262144 0 1 > > Try using the following options and the AS scheduler and let me know > if you still notice any 'hangs' OK, I've remounted (mount -o remount) with those options. I've set the strip_cache_size to 8192 I've set the nr_requests back to 128 I've set the schedulers to anticipatory. Unfortunately problem remains. I'll try the noop scheduler as I don't think I ever tried that one. > Justin. Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 12:53 ` Andrew Clayton @ 2007-10-05 13:18 ` Justin Piszcz 2007-10-05 13:30 ` Andrew Clayton 1 sibling, 0 replies; 56+ messages in thread From: Justin Piszcz @ 2007-10-05 13:18 UTC (permalink / raw) To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid On Fri, 5 Oct 2007, Andrew Clayton wrote: > On Fri, 5 Oct 2007 07:08:51 -0400 (EDT), Justin Piszcz wrote: > >> The mount options are from when the filesystem was made for >> sunit/swidth I believe. >> >> -N Causes the file system parameters to be printed >> out without really creating the file system. >> >> You should be able to run mkfs.xfs -N /dev/md0 to get that >> information. > > Can't do it while it's mounted. would xfs_info show the same stuff? > >> /dev/md3 /r1 xfs >> noatime,nodiratime,logbufs=8,logbsize=262144 0 1 >> >> Try using the following options and the AS scheduler and let me know >> if you still notice any 'hangs' > > OK, I've remounted (mount -o remount) with those options. > I've set the strip_cache_size to 8192 > I've set the nr_requests back to 128 > I've set the schedulers to anticipatory. > > Unfortunately problem remains. > > I'll try the noop scheduler as I don't think I ever tried that one. > >> Justin. > > Andrew > How are you measuring the problem? How can it be reproduced? Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 12:53 ` Andrew Clayton 2007-10-05 13:18 ` Justin Piszcz @ 2007-10-05 13:30 ` Andrew Clayton 2007-10-05 14:07 ` Justin Piszcz 1 sibling, 1 reply; 56+ messages in thread From: Andrew Clayton @ 2007-10-05 13:30 UTC (permalink / raw) To: Andrew Clayton; +Cc: Justin Piszcz, David Rees, linux-raid On Fri, 5 Oct 2007 13:53:12 +0100, Andrew Clayton wrote: > Unfortunately problem remains. > > I'll try the noop scheduler as I don't think I ever tried that one. Didn't help either, oh well. If I hit the disk in workstation with a big dd then in iostat I see it maxing out at about 40MB/sec with > 1 second await. The server seems to hit this with a much lower rate, < 10MB/sec maybe I think I'm going to also move the raid disks back onto the onboard controller (as Goswin von Brederlow said it should have more bandwidth anyway) as the PCI card doesn't seem to have helped and I'm seeing soft SATA resets coming from it. e.g ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 ata6.00: irq_stat 0x00020002, device error via D2H FIS ata6.00: cmd 35/00:00:07:4a:d9/00:04:02:00:00/e0 tag 0 cdb 0x0 data 524288 out res 51/84:00:06:4e:d9/00:00:02:00:00/e2 Emask 0x10 (ATA bus error) ata6: soft resetting port ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata6.00: configured for UDMA/100 ata6: EH complete Just to confirm, I was seeing the problem with the on board controller and thought moving the disks to the PCI card might help (at £35 it was worth a shot!) Cheers, Andrew - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 13:30 ` Andrew Clayton @ 2007-10-05 14:07 ` Justin Piszcz 2007-10-05 14:32 ` Andrew Clayton 2007-10-05 16:10 ` Andrew Clayton 0 siblings, 2 replies; 56+ messages in thread From: Justin Piszcz @ 2007-10-05 14:07 UTC (permalink / raw) To: Andrew Clayton; +Cc: Andrew Clayton, David Rees, linux-raid [-- Attachment #1: Type: TEXT/PLAIN, Size: 1590 bytes --] On Fri, 5 Oct 2007, Andrew Clayton wrote: > On Fri, 5 Oct 2007 13:53:12 +0100, Andrew Clayton wrote: > > >> Unfortunately problem remains. >> >> I'll try the noop scheduler as I don't think I ever tried that one. > > Didn't help either, oh well. > > If I hit the disk in workstation with a big dd then in iostat I see it > maxing out at about 40MB/sec with > 1 second await. The server seems to > hit this with a much lower rate, < 10MB/sec maybe > > I think I'm going to also move the raid disks back onto the onboard > controller (as Goswin von Brederlow said it should have more bandwidth > anyway) as the PCI card doesn't seem to have helped and I'm seeing soft > SATA resets coming from it. > > e.g > > ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 > ata6.00: irq_stat 0x00020002, device error via D2H FIS > ata6.00: cmd 35/00:00:07:4a:d9/00:04:02:00:00/e0 tag 0 cdb 0x0 data 524288 out > res 51/84:00:06:4e:d9/00:00:02:00:00/e2 Emask 0x10 (ATA bus error) > ata6: soft resetting port > ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > ata6.00: configured for UDMA/100 > ata6: EH complete > > > Just to confirm, I was seeing the problem with the on board controller > and thought moving the disks to the PCI card might help (at £35 it was > worth a shot!) > > Cheers, > > Andrew > Yikes, yeah I would get them off the PCI card, what kind of motherboard is it? If you don't have a PCI-e based board it probably won't help THAT much but it still should be better than placing 3 drives on a PCI card. Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 14:07 ` Justin Piszcz @ 2007-10-05 14:32 ` Andrew Clayton 2007-10-05 16:10 ` Andrew Clayton 1 sibling, 0 replies; 56+ messages in thread From: Andrew Clayton @ 2007-10-05 14:32 UTC (permalink / raw) To: Justin Piszcz; +Cc: Andrew Clayton, David Rees, linux-raid On Fri, 5 Oct 2007 10:07:47 -0400 (EDT), Justin Piszcz wrote: > Yikes, yeah I would get them off the PCI card, what kind of > motherboard is it? If you don't have a PCI-e based board it probably > won't help THAT much but it still should be better than placing 3 > drives on a PCI card. It's a Tyan Thunder K8S Pro S2882. No PCIe. Though given the fact that simply patching the kernel (on the RAID fs) when there's no other disk activity slows to a crawl which I'm fairly sure it didn't used, certainly these app stalls are new. The only trouble is I don't have any iostat profile from say a year ago when everything was OK. So I can't be 100% sure the current thing of spikes of iowait and await etc didn't actually always happen and it's actually something else that's wrong. > Justin. Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 14:07 ` Justin Piszcz 2007-10-05 14:32 ` Andrew Clayton @ 2007-10-05 16:10 ` Andrew Clayton 2007-10-05 16:16 ` Justin Piszcz 2007-10-05 18:58 ` Richard Scobie 1 sibling, 2 replies; 56+ messages in thread From: Andrew Clayton @ 2007-10-05 16:10 UTC (permalink / raw) To: Justin Piszcz; +Cc: Andrew Clayton, David Rees, linux-raid On Fri, 5 Oct 2007 10:07:47 -0400 (EDT), Justin Piszcz wrote: > Yikes, yeah I would get them off the PCI card, what kind of > motherboard is it? If you don't have a PCI-e based board it probably > won't help THAT much but it still should be better than placing 3 > drives on a PCI card. Moved the drives back onto the on board controller. While I had the machine down I ran memtest86+ for about 5 mins, no errors. I also got the output of mkfs.xfs -f -N /dev/md0 meta-data=/dev/md0 isize=256 agcount=16, agsize=7631168 blks = sectsz=4096 attr=0 data = bsize=4096 blocks=122097920, imaxpct=25 = sunit=64 swidth=128 blks, unwritten=1 naming =version 2 bsize=4096 log =internal log bsize=4096 blocks=32768, version=2 = sectsz=4096 sunit=1 blks, lazy-count=0 realtime =none extsz=524288 blocks=0, rtextents=0 > Justin. Thanks for your help by the way. Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 16:10 ` Andrew Clayton @ 2007-10-05 16:16 ` Justin Piszcz 2007-10-05 19:33 ` Andrew Clayton 2007-10-05 18:58 ` Richard Scobie 1 sibling, 1 reply; 56+ messages in thread From: Justin Piszcz @ 2007-10-05 16:16 UTC (permalink / raw) To: Andrew Clayton; +Cc: Andrew Clayton, David Rees, linux-raid On Fri, 5 Oct 2007, Andrew Clayton wrote: > On Fri, 5 Oct 2007 10:07:47 -0400 (EDT), Justin Piszcz wrote: > >> Yikes, yeah I would get them off the PCI card, what kind of >> motherboard is it? If you don't have a PCI-e based board it probably >> won't help THAT much but it still should be better than placing 3 >> drives on a PCI card. > > Moved the drives back onto the on board controller. > > While I had the machine down I ran memtest86+ for about 5 mins, no > errors. > > I also got the output of mkfs.xfs -f -N /dev/md0 > > meta-data=/dev/md0 isize=256 agcount=16, agsize=7631168 blks > = sectsz=4096 attr=0 > data = bsize=4096 blocks=122097920, imaxpct=25 > = sunit=64 swidth=128 blks, unwritten=1 > naming =version 2 bsize=4096 > log =internal log bsize=4096 blocks=32768, version=2 > = sectsz=4096 sunit=1 blks, lazy-count=0 > realtime =none extsz=524288 blocks=0, rtextents=0 > >> Justin. > > Thanks for your help by the way. > > Andrew > Hm, unfortunately at this point I think I am out of ideas you may need to ask the XFS/linux-raid developers how to run blktrace during those operations to figure out what is going on. BTW: Last thing I can think of, did you make any changes to PREEMPTION in the kernel, or do you disable it (SERVER)? Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 16:16 ` Justin Piszcz @ 2007-10-05 19:33 ` Andrew Clayton 0 siblings, 0 replies; 56+ messages in thread From: Andrew Clayton @ 2007-10-05 19:33 UTC (permalink / raw) To: Justin Piszcz; +Cc: Andrew Clayton, David Rees, linux-raid On Fri, 5 Oct 2007 12:16:07 -0400 (EDT), Justin Piszcz wrote: > > Hm, unfortunately at this point I think I am out of ideas you may > need to ask the XFS/linux-raid developers how to run blktrace during > those operations to figure out what is going on. No problem, cheers. > BTW: Last thing I can think of, did you make any changes to > PREEMPTION in the kernel, or do you disable it (SERVER)? I normally have it disabled, but did try with voluntary preemption, but with no effect. > Justin. Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 16:10 ` Andrew Clayton 2007-10-05 16:16 ` Justin Piszcz @ 2007-10-05 18:58 ` Richard Scobie 2007-10-05 19:02 ` Justin Piszcz 1 sibling, 1 reply; 56+ messages in thread From: Richard Scobie @ 2007-10-05 18:58 UTC (permalink / raw) To: linux-raid Have you had a look at the smartctl -a outputs of all the drives? Possibly one drive is being slow to respond due to seek errors etc. but I would perhaps expect to be seeing this in the log. If you have a full backup and a spare drive, I would probably rotate it through the array. Regards, Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 18:58 ` Richard Scobie @ 2007-10-05 19:02 ` Justin Piszcz 0 siblings, 0 replies; 56+ messages in thread From: Justin Piszcz @ 2007-10-05 19:02 UTC (permalink / raw) To: Richard Scobie; +Cc: linux-raid On Sat, 6 Oct 2007, Richard Scobie wrote: > Have you had a look at the smartctl -a outputs of all the drives? > > Possibly one drive is being slow to respond due to seek errors etc. but I > would perhaps expect to be seeing this in the log. > > If you have a full backup and a spare drive, I would probably rotate it > through the array. > > Regards, > > Richard > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Forgot about that, yeah post the smartctl -a output for each drive please. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 14:44 ` Andrew Clayton 2007-10-04 16:20 ` Justin Piszcz @ 2007-10-05 19:02 ` John Stoffel 2007-10-05 19:42 ` Andrew Clayton 1 sibling, 1 reply; 56+ messages in thread From: John Stoffel @ 2007-10-05 19:02 UTC (permalink / raw) To: Andrew Clayton; +Cc: Justin Piszcz, David Rees, Andrew Clayton, linux-raid >>>>> "Andrew" == Andrew Clayton <andrew@digital-domain.net> writes: Andrew> On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote: >> Also, did performance just go to crap one day or was it gradual? Andrew> IIRC I just noticed one day that firefox and vim was Andrew> stalling. That was back in February/March I think. At the time Andrew> the server was running a 2.6.18 kernel, since then I've tried Andrew> a few kernels in between that and currently 2.6.23-rc9 Andrew> Something seems to be periodically causing a lot of activity Andrew> that max's out the stripe_cache for a few seconds (when I was Andrew> trying to look with blktrace, it seemed pdflush was doing a Andrew> lot of activity during this time). Andrew> What I had noticed just recently was when I was the only one Andrew> doing IO on the server (no NFS running and I was logged in at Andrew> the console) even just patching the kernel was crawling to a Andrew> halt. How much memory does this system have? Have you checked the output of /proc/mtrr at all? There' have been reports of systems with a bad BIOS that gets the memory map wrong, causing access to memory to slow down drastically. So if you have 2gb of RAM, try booting with mem=1900m or something like that and seeing if things are better for you. Make sure your BIOS is upto the latest level as well. John ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 19:02 ` John Stoffel @ 2007-10-05 19:42 ` Andrew Clayton 2007-10-05 20:56 ` John Stoffel 0 siblings, 1 reply; 56+ messages in thread From: Andrew Clayton @ 2007-10-05 19:42 UTC (permalink / raw) To: John Stoffel; +Cc: Justin Piszcz, David Rees, Andrew Clayton, linux-raid On Fri, 5 Oct 2007 15:02:22 -0400, John Stoffel wrote: > > How much memory does this system have? Have you checked the output of 2GB > /proc/mtrr at all? There' have been reports of systems with a bad $ cat /proc/mtrr reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 > BIOS that gets the memory map wrong, causing access to memory to slow > down drastically. BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007fff0000 (usable) BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data) BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS) BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved) full dmesg (from 2.6.21-rc8-git2) at http://digital-domain.net/kernel/sw-raid5-issue/dmesg > So if you have 2gb of RAM, try booting with mem=1900m or something Worth a shot. > like that and seeing if things are better for you. > > Make sure your BIOS is upto the latest level as well. Hmm, I'll see whats involved in that. > John Cheers, Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 19:42 ` Andrew Clayton @ 2007-10-05 20:56 ` John Stoffel 2007-10-07 17:22 ` Andrew Clayton 0 siblings, 1 reply; 56+ messages in thread From: John Stoffel @ 2007-10-05 20:56 UTC (permalink / raw) To: Andrew Clayton Cc: John Stoffel, Justin Piszcz, David Rees, Andrew Clayton, linux-raid Andrew> On Fri, 5 Oct 2007 15:02:22 -0400, John Stoffel wrote: >> >> How much memory does this system have? Have you checked the output of Andrew> 2GB >> /proc/mtrr at all? There' have been reports of systems with a bad Andrew> $ cat /proc/mtrr Andrew> reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 That looks to be good, all the memory is there all in the same region. Oh well... it was a thought. >> BIOS that gets the memory map wrong, causing access to memory to slow >> down drastically. Andrew> BIOS-provided physical RAM map: Andrew> BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) Andrew> BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) Andrew> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) Andrew> BIOS-e820: 0000000000100000 - 000000007fff0000 (usable) Andrew> BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data) Andrew> BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS) Andrew> BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved) I dunno about this part. Andrew> full dmesg (from 2.6.21-rc8-git2) at Andrew> http://digital-domain.net/kernel/sw-raid5-issue/dmesg >> So if you have 2gb of RAM, try booting with mem=1900m or something Andrew> Worth a shot. It might make a difference, might not. Do you have any kernel debugging options turned on? That might also be an issue. Check your .config, there are a couple of options which drastically slow down the system. >> like that and seeing if things are better for you. >> >> Make sure your BIOS is upto the latest level as well. Andrew> Hmm, I'll see whats involved in that. At this point, I don't suspect the BIOS any more. Can you start a 'vmstat 1' in one window, then start whatever you do to get crappy performance. That would be interesting to see. John ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 20:56 ` John Stoffel @ 2007-10-07 17:22 ` Andrew Clayton 2007-10-11 17:06 ` Bill Davidsen 0 siblings, 1 reply; 56+ messages in thread From: Andrew Clayton @ 2007-10-07 17:22 UTC (permalink / raw) Cc: John Stoffel, Justin Piszcz, David Rees, Andrew Clayton, linux-raid On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote: > Can you start a 'vmstat 1' in one window, then start whatever you do > to get crappy performance. That would be interesting to see. In trying to find something simple that can show the problem I'm seeing. I think I may have found the culprit. Just testing on my machine at home, I made this simple program. /* fslattest.c */ #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/stat.h> #include <sys/types.h> #include <fcntl.h> #include <string.h> int main(int argc, char *argv[]) { char file[255]; if (argc < 2) { printf("Usage: fslattest file\n"); exit(1); } strncpy(file, argv[1], 254); printf("Opening %s\n", file); while (1) { int testfd = open(file, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600); close(testfd); unlink(file); sleep(1); } exit(0); } If I run this program under strace in my home directory (XFS file system on a (new) disk (no raid involved) all to its own.like $ strace -T -e open ./fslattest test It doesn't looks too bad. open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.005043> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000212> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.016844> If I then start up a dd in the same place. $ dd if=/dev/zero of=bigfile bs=1M count=500 Then I see the problem I'm seeing at work. open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <2.000348> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.594441> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <2.224636> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.074615> Doing the same on my other disk which is Ext3 and contains the root fs, it doesn't ever stutter open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.015423> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000092> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000093> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000088> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000103> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000096> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000094> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000114> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000091> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000274> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000107> Somewhere in there was the dd, but you can't tell. I've found if I mount the XFS filesystem with nobarrier, the latency is reduced to about 0.5 seconds with occasional spikes > 1 second. When doing this on the raid array. open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.009164> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000071> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.002667> dd kicks in open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <11.580238> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <3.222294> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.888863> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <4.297978> dd finishes open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000199> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.013413> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.025134> I guess I should take this to the XFS folks. > John Cheers, Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-07 17:22 ` Andrew Clayton @ 2007-10-11 17:06 ` Bill Davidsen 2007-10-11 18:07 ` Andrew Clayton 0 siblings, 1 reply; 56+ messages in thread From: Bill Davidsen @ 2007-10-11 17:06 UTC (permalink / raw) To: Andrew Clayton Cc: John Stoffel, Justin Piszcz, David Rees, Andrew Clayton, linux-raid Andrew Clayton wrote: > On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote: > > >> Can you start a 'vmstat 1' in one window, then start whatever you do >> to get crappy performance. That would be interesting to see. >> > > In trying to find something simple that can show the problem I'm > seeing. I think I may have found the culprit. > > Just testing on my machine at home, I made this simple program. > > /* fslattest.c */ > > #define _GNU_SOURCE > > #include <stdio.h> > #include <stdlib.h> > #include <unistd.h> > #include <sys/stat.h> > #include <sys/types.h> > #include <fcntl.h> > #include <string.h> > > > int main(int argc, char *argv[]) > { > char file[255]; > > if (argc < 2) { > printf("Usage: fslattest file\n"); > exit(1); > } > > strncpy(file, argv[1], 254); > printf("Opening %s\n", file); > > while (1) { > int testfd = open(file, > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600); > close(testfd); > unlink(file); > sleep(1); > } > > exit(0); > } > > > If I run this program under strace in my home directory (XFS file system > on a (new) disk (no raid involved) all to its own.like > > $ strace -T -e open ./fslattest test > > It doesn't looks too bad. > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.005043> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000212> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.016844> > > If I then start up a dd in the same place. > > $ dd if=/dev/zero of=bigfile bs=1M count=500 > > Then I see the problem I'm seeing at work. > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <2.000348> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.594441> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <2.224636> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.074615> > > Doing the same on my other disk which is Ext3 and contains the root fs, > it doesn't ever stutter > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.015423> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000092> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000093> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000088> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000103> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000096> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000094> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000114> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000091> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000274> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000107> > > > Somewhere in there was the dd, but you can't tell. > > I've found if I mount the XFS filesystem with nobarrier, the > latency is reduced to about 0.5 seconds with occasional spikes > 1 > second. > > When doing this on the raid array. > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.009164> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000071> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.002667> > > dd kicks in > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <11.580238> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <3.222294> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.888863> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <4.297978> > > dd finishes > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000199> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.013413> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.025134> > > > I guess I should take this to the XFS folks. Try mounting the filesystem "noatime" and see if that's part of the problem. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-11 17:06 ` Bill Davidsen @ 2007-10-11 18:07 ` Andrew Clayton 2007-10-11 23:43 ` Justin Piszcz 0 siblings, 1 reply; 56+ messages in thread From: Andrew Clayton @ 2007-10-11 18:07 UTC (permalink / raw) To: Bill Davidsen Cc: John Stoffel, Justin Piszcz, David Rees, Andrew Clayton, linux-raid On Thu, 11 Oct 2007 13:06:39 -0400, Bill Davidsen wrote: > Andrew Clayton wrote: > > On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote: > > > > >> Can you start a 'vmstat 1' in one window, then start whatever > > >> you do > >> to get crappy performance. That would be interesting to see. > >> > > > In trying to find something simple that can show the problem I'm > > seeing. I think I may have found the culprit. > > > > Just testing on my machine at home, I made this simple program. > > > > /* fslattest.c */ > > > > #define _GNU_SOURCE > > > > #include <stdio.h> > > #include <stdlib.h> > > #include <unistd.h> > > #include <sys/stat.h> > > #include <sys/types.h> > > #include <fcntl.h> > > #include <string.h> > > > > > > int main(int argc, char *argv[]) > > { > > char file[255]; > > > > if (argc < 2) { > > printf("Usage: fslattest file\n"); > > exit(1); > > } > > > > strncpy(file, argv[1], 254); > > printf("Opening %s\n", file); > > > > while (1) { > > int testfd = open(file, > > > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600); close(testfd); > > unlink(file); > > sleep(1); > > } > > > > exit(0); > > } > > > > > > If I run this program under strace in my home directory (XFS file > > system on a (new) disk (no raid involved) all to its own.like > > > > $ strace -T -e open ./fslattest test > > > > It doesn't looks too bad. > > > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 > > <0.005043> open("test", > > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000212> > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 > > <0.016844> > > > > If I then start up a dd in the same place. > > > > $ dd if=/dev/zero of=bigfile bs=1M count=500 > > > > Then I see the problem I'm seeing at work. > > > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 > > <2.000348> open("test", > > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.594441> > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 > > <2.224636> open("test", > > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.074615> > > > > Doing the same on my other disk which is Ext3 and contains the root > > fs, it doesn't ever stutter > > > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 > > <0.015423> open("test", > > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000092> > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 > > <0.000093> open("test", > > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000088> > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 > > <0.000103> open("test", > > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000096> > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 > > <0.000094> open("test", > > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000114> > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 > > <0.000091> open("test", > > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000274> > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 > > <0.000107> > > > > > > Somewhere in there was the dd, but you can't tell. > > > > I've found if I mount the XFS filesystem with nobarrier, the > > latency is reduced to about 0.5 seconds with occasional spikes > 1 > > second. > > > > When doing this on the raid array. > > > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.009164> > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000071> > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.002667> > > > > dd kicks in > > > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <11.580238> > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <3.222294> > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.888863> > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <4.297978> > > > > dd finishes > > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000199> > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.013413> > > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.025134> > > > > > > I guess I should take this to the XFS folks. > > Try mounting the filesystem "noatime" and see if that's part of the > problem. Yeah, it's mounted noatime. Looks like I tracked this down to an XFS regression. http://marc.info/?l=linux-fsdevel&m=119211228609886&w=2 Cheers, Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-11 18:07 ` Andrew Clayton @ 2007-10-11 23:43 ` Justin Piszcz 0 siblings, 0 replies; 56+ messages in thread From: Justin Piszcz @ 2007-10-11 23:43 UTC (permalink / raw) To: Andrew Clayton Cc: Bill Davidsen, John Stoffel, David Rees, Andrew Clayton, linux-raid On Thu, 11 Oct 2007, Andrew Clayton wrote: > On Thu, 11 Oct 2007 13:06:39 -0400, Bill Davidsen wrote: > >> Andrew Clayton wrote: >>> On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote: >>> >>> >> Can you start a 'vmstat 1' in one window, then start whatever >>> >> you do >>>> to get crappy performance. That would be interesting to see. >>>> > >>> In trying to find something simple that can show the problem I'm >>> seeing. I think I may have found the culprit. >>> >>> Just testing on my machine at home, I made this simple program. >>> >>> /* fslattest.c */ >>> >>> #define _GNU_SOURCE >>> >>> #include <stdio.h> >>> #include <stdlib.h> >>> #include <unistd.h> >>> #include <sys/stat.h> >>> #include <sys/types.h> >>> #include <fcntl.h> >>> #include <string.h> >>> >>> >>> int main(int argc, char *argv[]) >>> { >>> char file[255]; >>> >>> if (argc < 2) { >>> printf("Usage: fslattest file\n"); >>> exit(1); >>> } >>> >>> strncpy(file, argv[1], 254); >>> printf("Opening %s\n", file); >>> >>> while (1) { >>> int testfd = open(file, > >>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600); close(testfd); >>> unlink(file); >>> sleep(1); >>> } >>> >>> exit(0); >>> } >>> >>> >>> If I run this program under strace in my home directory (XFS file >>> system on a (new) disk (no raid involved) all to its own.like >>> >>> $ strace -T -e open ./fslattest test >>> >>> It doesn't looks too bad. >>> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 >>> <0.005043> open("test", >>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000212> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 >>> <0.016844> >>> >>> If I then start up a dd in the same place. >>> >>> $ dd if=/dev/zero of=bigfile bs=1M count=500 >>> >>> Then I see the problem I'm seeing at work. >>> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 >>> <2.000348> open("test", >>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.594441> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 >>> <2.224636> open("test", >>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.074615> >>> >>> Doing the same on my other disk which is Ext3 and contains the root >>> fs, it doesn't ever stutter >>> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 >>> <0.015423> open("test", >>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000092> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 >>> <0.000093> open("test", >>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000088> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 >>> <0.000103> open("test", >>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000096> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 >>> <0.000094> open("test", >>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000114> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 >>> <0.000091> open("test", >>> O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000274> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 >>> <0.000107> >>> >>> >>> Somewhere in there was the dd, but you can't tell. >>> >>> I've found if I mount the XFS filesystem with nobarrier, the >>> latency is reduced to about 0.5 seconds with occasional spikes > 1 >>> second. >>> >>> When doing this on the raid array. >>> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.009164> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000071> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.002667> >>> >>> dd kicks in >>> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <11.580238> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <3.222294> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.888863> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <4.297978> >>> >>> dd finishes > >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000199> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.013413> >>> open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.025134> >>> >>> >>> I guess I should take this to the XFS folks. >> >> Try mounting the filesystem "noatime" and see if that's part of the >> problem. > > Yeah, it's mounted noatime. Looks like I tracked this down to an XFS > regression. > > http://marc.info/?l=linux-fsdevel&m=119211228609886&w=2 > > Cheers, > > Andrew > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Nice! Thanks for reporting the final result, 1-2 weeks of debugging/discussion, nice you found it. Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 14:09 ` Justin Piszcz 2007-10-04 14:10 ` Justin Piszcz @ 2007-10-04 14:36 ` Andrew Clayton 2007-10-04 14:39 ` Justin Piszcz 2007-10-04 14:39 ` Justin Piszcz 1 sibling, 2 replies; 56+ messages in thread From: Andrew Clayton @ 2007-10-04 14:36 UTC (permalink / raw) To: Justin Piszcz; +Cc: David Rees, Andrew Clayton, linux-raid On Thu, 4 Oct 2007 10:09:22 -0400 (EDT), Justin Piszcz wrote: > Is NCQ enabled on the drives? I don't think the drives are capable of that. I don't seen any mention of NCQ in dmesg. Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 14:36 ` Andrew Clayton @ 2007-10-04 14:39 ` Justin Piszcz 2007-10-04 15:03 ` Andrew Clayton 2007-10-04 14:39 ` Justin Piszcz 1 sibling, 1 reply; 56+ messages in thread From: Justin Piszcz @ 2007-10-04 14:39 UTC (permalink / raw) To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid On Thu, 4 Oct 2007, Andrew Clayton wrote: > On Thu, 4 Oct 2007 10:09:22 -0400 (EDT), Justin Piszcz wrote: > >> Is NCQ enabled on the drives? > > I don't think the drives are capable of that. I don't seen any mention > of NCQ in dmesg. > > > Andrew > What type (make/model) of the drives? True, the controller may not be able to do it either. What types of disks/controllers again? Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 14:39 ` Justin Piszcz @ 2007-10-04 15:03 ` Andrew Clayton 2007-10-04 16:19 ` Justin Piszcz 2007-10-04 16:46 ` Steve Cousins 0 siblings, 2 replies; 56+ messages in thread From: Andrew Clayton @ 2007-10-04 15:03 UTC (permalink / raw) To: Justin Piszcz; +Cc: David Rees, Andrew Clayton, linux-raid On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote: > What type (make/model) of the drives? The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100 > True, the controller may not be able to do it either. > > What types of disks/controllers again? The RAID disks are currently connected to a Silicon Image PCI card are configured as a software RAID 5 03:04.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02) Subsystem: Silicon Image, Inc. Unknown device 7124 Flags: bus master, stepping, 66MHz, medium devsel, latency 64, IRQ 16 Memory at feafec00 (64-bit, non-prefetchable) [size=128] Memory at feaf0000 (64-bit, non-prefetchable) [size=32K] I/O ports at bc00 [size=16] Expansion ROM at fea00000 [disabled] [size=512K] Capabilities: [64] Power Management version 2 Capabilities: [40] PCI-X non-bridge device Capabilities: [54] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- The problem originated when the disks where connected to the on board Silicon Image 3114 controller. > Justin. Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 15:03 ` Andrew Clayton @ 2007-10-04 16:19 ` Justin Piszcz 2007-10-04 19:01 ` Andrew Clayton 2007-10-04 16:46 ` Steve Cousins 1 sibling, 1 reply; 56+ messages in thread From: Justin Piszcz @ 2007-10-04 16:19 UTC (permalink / raw) To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid On Thu, 4 Oct 2007, Andrew Clayton wrote: > On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote: > > >> What type (make/model) of the drives? > > The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100 > >> True, the controller may not be able to do it either. >> >> What types of disks/controllers again? > > The RAID disks are currently connected to a Silicon Image PCI card are > configured as a software RAID 5 > > 03:04.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02) > Subsystem: Silicon Image, Inc. Unknown device 7124 > Flags: bus master, stepping, 66MHz, medium devsel, latency 64, IRQ 16 > Memory at feafec00 (64-bit, non-prefetchable) [size=128] > Memory at feaf0000 (64-bit, non-prefetchable) [size=32K] > I/O ports at bc00 [size=16] > Expansion ROM at fea00000 [disabled] [size=512K] > Capabilities: [64] Power Management version 2 > Capabilities: [40] PCI-X non-bridge device > Capabilities: [54] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- > > > The problem originated when the disks where connected to the on board > Silicon Image 3114 controller. > >> Justin. > > Andrew > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > 7K250 http://www.itreviews.co.uk/hardware/h912.htm http://techreport.com/articles.x/8362 "The T7K250 also supports Native Command Queuing (NCQ)." You need to enable AHCI in order to reap the benefits though. Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 16:19 ` Justin Piszcz @ 2007-10-04 19:01 ` Andrew Clayton 0 siblings, 0 replies; 56+ messages in thread From: Andrew Clayton @ 2007-10-04 19:01 UTC (permalink / raw) To: Justin Piszcz; +Cc: David Rees, Andrew Clayton, linux-raid On Thu, 4 Oct 2007 12:19:20 -0400 (EDT), Justin Piszcz wrote: > > 7K250 > > http://www.itreviews.co.uk/hardware/h912.htm > > http://techreport.com/articles.x/8362 > "The T7K250 also supports Native Command Queuing (NCQ)." > > You need to enable AHCI in order to reap the benefits though. Cheers, I'll take a look at that. > Justin. Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 15:03 ` Andrew Clayton 2007-10-04 16:19 ` Justin Piszcz @ 2007-10-04 16:46 ` Steve Cousins 2007-10-04 17:06 ` Steve Cousins 2007-10-04 19:06 ` Andrew Clayton 1 sibling, 2 replies; 56+ messages in thread From: Steve Cousins @ 2007-10-04 16:46 UTC (permalink / raw) To: Andrew Clayton; +Cc: Justin Piszcz, David Rees, Andrew Clayton, linux-raid Andrew Clayton wrote: > On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote: > > > >> What type (make/model) of the drives? >> > > The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100 > A couple of things: 1. I thought you had SATA drives 2. ATA-6 would be UDMA/133 The SATA-1 versions of the 7K250's did not have NCQ. The SATA-2 versions do have NCQ. If you do have SATA drives, are they SATA-1 or SATA-2? Steve ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 16:46 ` Steve Cousins @ 2007-10-04 17:06 ` Steve Cousins 2007-10-04 19:06 ` Andrew Clayton 1 sibling, 0 replies; 56+ messages in thread From: Steve Cousins @ 2007-10-04 17:06 UTC (permalink / raw) To: Andrew Clayton; +Cc: Justin Piszcz, David Rees, Andrew Clayton, linux-raid Steve Cousins wrote: > A couple of things: > > 1. I thought you had SATA drives > 2. ATA-6 would be UDMA/133 Number 2 is not correct. Sorry about that. Steve ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 16:46 ` Steve Cousins 2007-10-04 17:06 ` Steve Cousins @ 2007-10-04 19:06 ` Andrew Clayton 2007-10-05 10:20 ` Justin Piszcz 1 sibling, 1 reply; 56+ messages in thread From: Andrew Clayton @ 2007-10-04 19:06 UTC (permalink / raw) To: Steve Cousins; +Cc: Justin Piszcz, David Rees, Andrew Clayton, linux-raid On Thu, 04 Oct 2007 12:46:05 -0400, Steve Cousins wrote: > Andrew Clayton wrote: > > On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote: > > > > > > >> What type (make/model) of the drives? > >> > > > The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100 > > A couple of things: > > 1. I thought you had SATA drives > 2. ATA-6 would be UDMA/133 > > The SATA-1 versions of the 7K250's did not have NCQ. The SATA-2 > versions do have NCQ. If you do have SATA drives, are they SATA-1 or > SATA-2? Not sure, I suspect SATA 1 seeing as we've had them nearly 3 years. Some bits from dmesg ata1: SATA max UDMA/100 cmd 0xffffc20000aa4880 ctl 0xffffc20000aa488a bmdma 0xff ffc20000aa4800 irq 19 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata1.00: ATA-6: HDS722525VLSA80, V36OA63A, max UDMA/100 ata1.00: 488397168 sectors, multi 16: LBA48 ata1.00: configured for UDMA/100 > Steve Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 19:06 ` Andrew Clayton @ 2007-10-05 10:20 ` Justin Piszcz 0 siblings, 0 replies; 56+ messages in thread From: Justin Piszcz @ 2007-10-05 10:20 UTC (permalink / raw) To: Andrew Clayton; +Cc: Steve Cousins, David Rees, Andrew Clayton, linux-raid On Thu, 4 Oct 2007, Andrew Clayton wrote: > On Thu, 04 Oct 2007 12:46:05 -0400, Steve Cousins wrote: > >> Andrew Clayton wrote: >>> On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote: >>> >>> >>> >> What type (make/model) of the drives? >>>> > >>> The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100 >>> A couple of things: >> >> 1. I thought you had SATA drives >> 2. ATA-6 would be UDMA/133 >> >> The SATA-1 versions of the 7K250's did not have NCQ. The SATA-2 >> versions do have NCQ. If you do have SATA drives, are they SATA-1 or >> SATA-2? > > Not sure, I suspect SATA 1 seeing as we've had them nearly 3 years. > > Some bits from dmesg > > ata1: SATA max UDMA/100 cmd 0xffffc20000aa4880 ctl 0xffffc20000aa488a > bmdma 0xff ffc20000aa4800 irq 19 > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) > ata1.00: ATA-6: HDS722525VLSA80, V36OA63A, max UDMA/100 > ata1.00: 488397168 sectors, multi 16: LBA48 > ata1.00: configured for UDMA/100 > >> Steve > > Andrew > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Looks like SATA1 (non-ncq) to me. Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-04 14:36 ` Andrew Clayton 2007-10-04 14:39 ` Justin Piszcz @ 2007-10-04 14:39 ` Justin Piszcz 1 sibling, 0 replies; 56+ messages in thread From: Justin Piszcz @ 2007-10-04 14:39 UTC (permalink / raw) To: Andrew Clayton; +Cc: David Rees, Andrew Clayton, linux-raid On Thu, 4 Oct 2007, Andrew Clayton wrote: > On Thu, 4 Oct 2007 10:09:22 -0400 (EDT), Justin Piszcz wrote: > >> Is NCQ enabled on the drives? > > I don't think the drives are capable of that. I don't seen any mention > of NCQ in dmesg. > > > Andrew > BTW You may not see 'NCQ' in the kernel messages unless you enable AHCI. Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-03 9:53 RAID 5 performance issue Andrew Clayton 2007-10-03 16:43 ` Justin Piszcz @ 2007-10-03 17:53 ` Goswin von Brederlow 2007-10-03 20:20 ` Andrew Clayton 2007-10-05 20:25 ` Brendan Conoboy 2 siblings, 1 reply; 56+ messages in thread From: Goswin von Brederlow @ 2007-10-03 17:53 UTC (permalink / raw) To: Andrew Clayton; +Cc: linux-raid Andrew Clayton <andrew@pccl.info> writes: > Hi, > > Hardware: > > Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file system) is connected to the onboard Silicon Image 3114 controller. The other 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 card. I moved the 3 raid disks off the on board controller onto the card the other day to see if that would help, it didn't. I would think the onboard controller is connected to the north or south bridge and possibly hooked directly into the hyper transport. The extra controler is PCI so you are limited to theoretical 128MiB/s. For me the onboard chips do much better (though at higher cpu cost) than pci cards. MfG Goswin ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-03 17:53 ` Goswin von Brederlow @ 2007-10-03 20:20 ` Andrew Clayton 2007-10-03 20:48 ` Richard Scobie 0 siblings, 1 reply; 56+ messages in thread From: Andrew Clayton @ 2007-10-03 20:20 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: Andrew Clayton, linux-raid On Wed, 03 Oct 2007 19:53:08 +0200, Goswin von Brederlow wrote: > Andrew Clayton <andrew@pccl.info> writes: > > > Hi, > > > > Hardware: > > > > Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 > > (root file system) is connected to the onboard Silicon Image 3114 > > controller. The other 3 (/home) are in a software RAID 5 connected > > to a PCI Silicon Image 3124 card. I moved the 3 raid disks off the > > on board controller onto the card the other day to see if that > > would help, it didn't. > > I would think the onboard controller is connected to the north or > south bridge and possibly hooked directly into the hyper > transport. The extra controler is PCI so you are limited to > theoretical 128MiB/s. For me the onboard chips do much better (though > at higher cpu cost) than pci cards. Yeah, I was wondering about that. It certainly hasn't improved things, it's unclear if it's made things any worse.. > MfG > Goswin Cheers, Andrew ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-03 20:20 ` Andrew Clayton @ 2007-10-03 20:48 ` Richard Scobie 0 siblings, 0 replies; 56+ messages in thread From: Richard Scobie @ 2007-10-03 20:48 UTC (permalink / raw) To: linux-raid Andrew Clayton wrote: > Yeah, I was wondering about that. It certainly hasn't improved things, > it's unclear if it's made things any worse.. > Many 3124 cards are PCI-X, so if you have one of these (and you seem to be using a server board which may well have PCI-X), bus performance is not going to be an issue. Regards, Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-03 9:53 RAID 5 performance issue Andrew Clayton 2007-10-03 16:43 ` Justin Piszcz 2007-10-03 17:53 ` Goswin von Brederlow @ 2007-10-05 20:25 ` Brendan Conoboy 2007-10-06 0:38 ` Dean S. Messing 2 siblings, 1 reply; 56+ messages in thread From: Brendan Conoboy @ 2007-10-05 20:25 UTC (permalink / raw) To: Andrew Clayton; +Cc: linux-raid Andrew Clayton wrote: > If anyone has any idea's I'm all ears. Hi Andrew, Are you sure your drives are healthy? Try benchmarking each drive individually and see if there is a dramatic performance difference between any of them. One failing drive can slow down an entire array. Only after you have determined that your drives are healthy when accessed individually are combined results particularly meaningful. For a generic SATA 1 drive you should expect a sustained raw read or write in excess of 45 MB/s. Check both read and write (this will destroy data) and make sure your cache is clear prior to the read test and after the write test. If each drive is working at a reasonable rate individually, you're ready to move on. The next question is: What happens when you access more than one device at the same time? You should either get nearly full combined performance, max out CPU, or get throttled by bus bandwidth (An actual kernel bug could also come into play here, but I tend to doubt it). Is the onboard SATA controller real SATA or just an ATA-SATA converter? If the latter, you're going to have trouble getting faster performance than any one disk can give you at a time. The output of 'lspci' should tell you if the onboard SATA controller is on its own bus or sharing space with some other device. Pasting the output here would be useful. Assuming you get good performance out of all 3 drives at the same time, it's time to create a RAID 5 md device with the three, make sure your parity is done building, then benchmark that. It's going to be slower to write and a bit slower to read (especially if your CPU is maxed out), but that is normal. Assuming you get good performance out of your md device, it's time to put your filesystem on the md device and benchmark that. If you use ext3, remember to set the stride parameter per the raid howto. I am unfamiliar with other fs/md interactions, so be sure to check. If you're actually maxing out your bus bandwidth and the onboard sata controller is on a different bus than the pci sata controller, try balancing the drives between the two to get a larger combined pipe. Good luck, -- Brendan Conoboy / Red Hat, Inc. / blc@redhat.com ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-05 20:25 ` Brendan Conoboy @ 2007-10-06 0:38 ` Dean S. Messing 2007-10-06 8:18 ` Justin Piszcz 0 siblings, 1 reply; 56+ messages in thread From: Dean S. Messing @ 2007-10-06 0:38 UTC (permalink / raw) To: linux-raid; +Cc: blc Brendan Conoboy wrote: <snip> > Is the onboard SATA controller real SATA or just an ATA-SATA > converter? If the latter, you're going to have trouble getting faster > performance than any one disk can give you at a time. The output of > 'lspci' should tell you if the onboard SATA controller is on its own > bus or sharing space with some other device. Pasting the output here > would be useful. <snip> N00bee question: How does one tell if a machine's disk controller is an ATA-SATA converter? The output of `lspci|fgrep -i sata' is: 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\ (rev 09) suggests a real SATA. These references to ATA in "dmesg", however, make me wonder. ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133 ata1.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133 ata2.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32) ata2.00: configured for UDMA/133 ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133 ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata3.00: configured for UDMA/133 Dean ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-06 0:38 ` Dean S. Messing @ 2007-10-06 8:18 ` Justin Piszcz 2007-10-08 1:40 ` Dean S. Messing 0 siblings, 1 reply; 56+ messages in thread From: Justin Piszcz @ 2007-10-06 8:18 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid, blc On Fri, 5 Oct 2007, Dean S. Messing wrote: > > Brendan Conoboy wrote: > <snip> >> Is the onboard SATA controller real SATA or just an ATA-SATA >> converter? If the latter, you're going to have trouble getting faster >> performance than any one disk can give you at a time. The output of >> 'lspci' should tell you if the onboard SATA controller is on its own >> bus or sharing space with some other device. Pasting the output here >> would be useful. > <snip> > > N00bee question: > > How does one tell if a machine's disk controller is an ATA-SATA > converter? > > The output of `lspci|fgrep -i sata' is: > > 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\ > (rev 09) > > suggests a real SATA. These references to ATA in "dmesg", however, > make me wonder. > > ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133 > ata1.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32) > ata1.00: configured for UDMA/133 > ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133 > ata2.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32) > ata2.00: configured for UDMA/133 > ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133 > ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32) > ata3.00: configured for UDMA/133 > > > Dean > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > His drives are either really old and do not support NCQ or he is not using AHCI in the BIOS. Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-06 8:18 ` Justin Piszcz @ 2007-10-08 1:40 ` Dean S. Messing 2007-10-08 8:44 ` Justin Piszcz 0 siblings, 1 reply; 56+ messages in thread From: Dean S. Messing @ 2007-10-08 1:40 UTC (permalink / raw) To: jpiszcz; +Cc: linux-raid, blc Justin Piszcz wrote: >On Fri, 5 Oct 2007, Dean S. Messing wrote: >> >> Brendan Conoboy wrote: >> <snip> >>> Is the onboard SATA controller real SATA or just an ATA-SATA >>> converter? If the latter, you're going to have trouble getting faster >>> performance than any one disk can give you at a time. The output of >>> 'lspci' should tell you if the onboard SATA controller is on its own >>> bus or sharing space with some other device. Pasting the output here >>> would be useful. >> <snip> >> >> N00bee question: >> >> How does one tell if a machine's disk controller is an ATA-SATA >> converter? >> >> The output of `lspci|fgrep -i sata' is: >> >> 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\ >> (rev 09) >> >> suggests a real SATA. These references to ATA in "dmesg", however, >> make me wonder. >> >> ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >> ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133 >> ata1.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32) >> ata1.00: configured for UDMA/133 >> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >> ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133 >> ata2.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32) >> ata2.00: configured for UDMA/133 >> ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >> ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133 >> ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32) >> ata3.00: configured for UDMA/133 >> >> >> Dean >> - >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >His drives are either really old and do not support NCQ or he is not using >AHCI in the BIOS. Sorry, Justin, if I wasn't clear. I was asking the N00bee question about _my_own_ machine. The output of lspci (on my machine) seems to indicate I have a "real" STAT controller on the Motherboard, but the contents of "dmesg", with the references to ATA-7 and UDMA/133, made me wonder if I had just an ATA-SATA converter. Hence my question: how does one tell definitively if one has a real SATA controller on the Mother Board? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: RAID 5 performance issue. 2007-10-08 1:40 ` Dean S. Messing @ 2007-10-08 8:44 ` Justin Piszcz 0 siblings, 0 replies; 56+ messages in thread From: Justin Piszcz @ 2007-10-08 8:44 UTC (permalink / raw) To: Dean S. Messing; +Cc: linux-raid, blc On Sun, 7 Oct 2007, Dean S. Messing wrote: > > Justin Piszcz wrote: >> On Fri, 5 Oct 2007, Dean S. Messing wrote: >>> >>> Brendan Conoboy wrote: >>> <snip> >>>> Is the onboard SATA controller real SATA or just an ATA-SATA >>>> converter? If the latter, you're going to have trouble getting faster >>>> performance than any one disk can give you at a time. The output of >>>> 'lspci' should tell you if the onboard SATA controller is on its own >>>> bus or sharing space with some other device. Pasting the output here >>>> would be useful. >>> <snip> >>> >>> N00bee question: >>> >>> How does one tell if a machine's disk controller is an ATA-SATA >>> converter? >>> >>> The output of `lspci|fgrep -i sata' is: >>> >>> 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\ >>> (rev 09) >>> >>> suggests a real SATA. These references to ATA in "dmesg", however, >>> make me wonder. >>> >>> ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >>> ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133 >>> ata1.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32) >>> ata1.00: configured for UDMA/133 >>> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >>> ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133 >>> ata2.00: 312500000 sectors, multi 0: LBA48 NCQ (depth 31/32) >>> ata2.00: configured for UDMA/133 >>> ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>> ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133 >>> ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32) >>> ata3.00: configured for UDMA/133 >>> >>> >>> Dean >>> - >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> His drives are either really old and do not support NCQ or he is not using >> AHCI in the BIOS. > > Sorry, Justin, if I wasn't clear. I was asking the N00bee question > about _my_own_ machine. The output of lspci (on my machine) seems to > indicate I have a "real" STAT controller on the Motherboard, but the > contents of "dmesg", with the references to ATA-7 and UDMA/133, made > me wonder if I had just an ATA-SATA converter. Hence my question: how > does one tell definitively if one has a real SATA controller on the Mother > Board? > The output looks like a real (AHCI-capable) SATA controller and your drives are using NCQ/AHCI. Output from one of my machines: [ 23.621462] ata1: SATA max UDMA/133 cmd 0xf8812100 ctl 0x00000000 bmdma 0x00000000 irq 219 [ 24.078390] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 24.549806] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) As far as why it shows UDMA/133 in the kernel output I am sure there is a reason :) I know in the older SATA drives there was a bridge chip that was used to convert the drive from IDE<->SATA maybe it is from those legacy days, not sure. With the newer NCQ/'native' SATA drives, the bridge chip should no longer exist. Justin. ^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2007-10-11 23:43 UTC | newest] Thread overview: 56+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-10-03 9:53 RAID 5 performance issue Andrew Clayton 2007-10-03 16:43 ` Justin Piszcz 2007-10-03 16:48 ` Justin Piszcz 2007-10-03 20:10 ` Andrew Clayton 2007-10-03 20:16 ` Justin Piszcz 2007-10-06 12:30 ` Justin Piszcz 2007-10-06 16:05 ` Justin Piszcz 2007-10-03 20:19 ` Andrew Clayton 2007-10-03 20:35 ` Justin Piszcz 2007-10-03 20:46 ` Andrew Clayton 2007-10-03 20:36 ` David Rees 2007-10-03 20:48 ` Andrew Clayton 2007-10-04 14:08 ` Andrew Clayton 2007-10-04 14:09 ` Justin Piszcz 2007-10-04 14:10 ` Justin Piszcz 2007-10-04 14:44 ` Andrew Clayton 2007-10-04 16:20 ` Justin Piszcz 2007-10-04 18:26 ` Andrew Clayton 2007-10-05 10:25 ` Justin Piszcz 2007-10-05 10:57 ` Andrew Clayton 2007-10-05 11:08 ` Justin Piszcz 2007-10-05 12:53 ` Andrew Clayton 2007-10-05 13:18 ` Justin Piszcz 2007-10-05 13:30 ` Andrew Clayton 2007-10-05 14:07 ` Justin Piszcz 2007-10-05 14:32 ` Andrew Clayton 2007-10-05 16:10 ` Andrew Clayton 2007-10-05 16:16 ` Justin Piszcz 2007-10-05 19:33 ` Andrew Clayton 2007-10-05 18:58 ` Richard Scobie 2007-10-05 19:02 ` Justin Piszcz 2007-10-05 19:02 ` John Stoffel 2007-10-05 19:42 ` Andrew Clayton 2007-10-05 20:56 ` John Stoffel 2007-10-07 17:22 ` Andrew Clayton 2007-10-11 17:06 ` Bill Davidsen 2007-10-11 18:07 ` Andrew Clayton 2007-10-11 23:43 ` Justin Piszcz 2007-10-04 14:36 ` Andrew Clayton 2007-10-04 14:39 ` Justin Piszcz 2007-10-04 15:03 ` Andrew Clayton 2007-10-04 16:19 ` Justin Piszcz 2007-10-04 19:01 ` Andrew Clayton 2007-10-04 16:46 ` Steve Cousins 2007-10-04 17:06 ` Steve Cousins 2007-10-04 19:06 ` Andrew Clayton 2007-10-05 10:20 ` Justin Piszcz 2007-10-04 14:39 ` Justin Piszcz 2007-10-03 17:53 ` Goswin von Brederlow 2007-10-03 20:20 ` Andrew Clayton 2007-10-03 20:48 ` Richard Scobie 2007-10-05 20:25 ` Brendan Conoboy 2007-10-06 0:38 ` Dean S. Messing 2007-10-06 8:18 ` Justin Piszcz 2007-10-08 1:40 ` Dean S. Messing 2007-10-08 8:44 ` Justin Piszcz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).