* Striping does not increase performance.
@ 2012-03-12 12:34 Caspar Smit
2012-03-12 12:57 ` Erwan MAS
2012-03-12 14:20 ` David Brown
0 siblings, 2 replies; 11+ messages in thread
From: Caspar Smit @ 2012-03-12 12:34 UTC (permalink / raw)
To: linux-raid, fio
[-- Attachment #1: Type: text/plain, Size: 8071 bytes --]
Hi all,
I don't know exactly which mailinglists to use for this one so I hope
i used the right ones.
I did some performance testing on a new system and found out some
things I couldn't explain or didn't expect.
At the end are some questions I hope to get answered to explain the
tings i'm seeing in the test.
First of all a description of my setup.
The server is a 36 bay 3,5" supermicro chassis filled with 36x 2TB
SATA 7200 RPM disks. I use a single SSD as OS disk (Debian Squeeze
stable) connected to the motherboard SATA port1. The data disks are
connected to a LSI 9211-8i controller in IT mode (non-raid).
Once booted I created 3 raid6 MD devices of 10 disks each (16TB netto
each) with 6 global hotspares in the same sparegroup.
All MD devices have a chunk size of 64KB
I started testing using fio (2.0.4)
I used a bandwidth random read test I found on the Fusion IO website.
after every test i ran: sync; echo 3 >/proc/sys/vm/drop_caches;
The first test was on a single md device.
fio --name=test1 --ioengine=sync --direct=1 --rw=randread --bs=1m
--runtime=10 --filename=/dev/md0 --iodepth=1 --invalidate=1
(check the attachments for the full fio output, i'll only post the (in
my eyes) important snippets).
read : io=518144KB, bw=51726KB/s, iops=50 , runt= 10017msec
Disk stats (read/write):
md0: ios=8000/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=674/0, aggrmerge=0/0, aggrticks=10183/0, aggrin_queue=10183,
aggrutil=77.97%
sdc: ios=830/0, merge=0/0, ticks=12584/0, in_queue=12584, util=63.15%
So the IOPS for this test is 50. Bandwidth is around 50MB/s (50 iops x
1m blocksize)
The md0 device gets 8000 total IO's. Divided by 10 seconds this is 800
IOPS (So the 1MB blocksize go to the disks in 64KB blocks because the
chunksize of the md is 64KB. 800 IOPS is 16x the reported 50 IOPS
above = 1MB / 16 = 64KB)
The md0 gets 800 IOPS in 64KB blocks and this relates to 50 IOPS in 1MB blocks
Each disks gets around 800 total IO's, divided by 10 is 80 IOPS, which
is slightly lower than the number you would expect from a 7200 RPM
SATA disk (around 110 IOPS)
For the next test I introduced LVM (create a VG with md0 as PV, then
created a 2TB LV on it)
fio --name=test2 --ioengine=sync --direct=1 --rw=randread --bs=1m
--runtime=10 --filename=/dev/dm-0 --iodepth=1 --invalidate=1
read : io=705536KB, bw=70476KB/s, iops=68 , runt= 10011msec
Disk stats (read/write):
dm-0: ios=10880/0, merge=0/0, ticks=114720/0, in_queue=114784,
util=98.86%, aggrios=11024/0, aggrmerge=0/0, aggrticks=0/0,
aggrin_queue=0, aggrutil=0.00%
md0: ios=11024/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=918/0, aggrmerge=0/0, aggrticks=9684/0, aggrin_queue=9684,
aggrutil=74.05%
sdc: ios=1106/0, merge=0/0, ticks=11668/0, in_queue=11668, util=72.07%
Now the IOPS went from 50 to around 70 and the bandwidth from 50MB/s
to 70MB/s which is not what I expected since I introduced another
layer and the performance increases.
The dm-0 device gets 10880 total IO's. Divided by 10 seconds this is
1088 IOPS (So the 1MB blocksize go to the disks in 64KB blocks because
the chunksize of the md is 64KB. 1088 IOPS is 16x the reported 68 IOPS
above = 1MB / 16 = 64KB)
Each disk now does around 110 IOPS which is what you would expect for
these type of disks.
For the next test I wanted to see if i could double the performance by
striping an LV over 2 md's (so instead of using 10 disks/spindles, use
20 disks/spindles)
So i added md1 to the VG as PV.
Created a fresh LV striped across the two PV's using a 64KB stripe
size and ran the test again.
fio --name=test3 --ioengine=sync --direct=1 --rw=randread --bs=1m
--runtime=10 --filename=/dev/dm-0 --iodepth=1 --invalidate=1
Now things are getting interesting:
read : io=769024KB, bw=76849KB/s, iops=75 , runt= 10007msec
Disk stats (read/write):
dm-0: ios=190464/0, merge=0/0, ticks=1695940/0, in_queue=1695940,
util=98.03%, aggrios=96128/0, aggrmerge=0/0, aggrticks=0/0,
aggrin_queue=0, aggrutil=0.00%
md0: ios=96128/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=502/0, aggrmerge=7508/0, aggrticks=4517/0, aggrin_queue=4517,
aggrutil=55.56%
sdc: ios=583/0, merge=8745/0, ticks=5100/0, in_queue=5100, util=50.53%
md1: ios=96128/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=503/0, aggrmerge=7507/0, aggrticks=4469/0, aggrin_queue=4469,
aggrutil=55.77%
Now the total IO's in 10 seconds are 16x larger than before. 190464 /
10 = 19046,4 / 16 = 1190,4 /16 = the reported 75 IOPS above.
So the 64KB blocks seem to be split into 4KB blocks (64 / 16 = 4)
which results in a way larger total IO's.
The IO's per disk seem to be in 64KB blocks still only now with a
large MERGE figure besides it. (Now 4KB blocks are merged into 64KB
blocks?)
The performance does not double but stays the same as with 1 MD set
only the total IO's are spread among the MD's. Each disk now does
around 60 IOPS!
I still wanted to see if I could double the performance and thought it
might have something to do with LVM striping so i ditched LVM and
created a RAID0 (md6) over md0 and md1 with a chunk size of 64KB
again.
fio --name=test4 --ioengine=sync --direct=1 --rw=randread --bs=1m
--runtime=10 --filename=/dev/md6 --iodepth=1 --invalidate=1
Same story as for LVM striping only the performance is even worse:
read : io=462848KB, bw=46257KB/s, iops=45 , runt= 10006msec
md6: ios=114432/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=57856/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0,
aggrutil=0.00%
md0: ios=57856/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=302/0, aggrmerge=4519/0, aggrticks=4782/0, aggrin_queue=4782,
aggrutil=59.46%
sdc: ios=360/0, merge=5368/0, ticks=5676/0, in_queue=5676, util=55.89%
Performance is spread among the md's and each disk only does 36 IOPS now!
As a final test I wanted to know how 20 disks in a single raid6
performed so i created 1 md out of 20 disks (still using a 64KB chunk
size).
fio --name=test5 --ioengine=sync --direct=1 --rw=randread --bs=1m
--runtime=10 --filename=/dev/md0 --iodepth=1 --invalidate=1
read : io=504832KB, bw=50453KB/s, iops=49 , runt= 10006msec
md0: ios=7808/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=328/0, aggrmerge=0/0, aggrticks=4925/0, aggrin_queue=4925,
aggrutil=61.63%
sdb: ios=395/0, merge=0/0, ticks=5652/0, in_queue=5652, util=56.00%
50 IOPS total (1MB blocks) and each disk is doing around 40 IOPS (64KB blocks)
And adding LVM (VG over md0 and a 2TB LV) to that
fio --name=test6 --ioengine=sync --direct=1 --rw=randread --bs=1m
--runtime=10 --filename=/dev/md0 --iodepth=1 --invalidate=1
read : io=801792KB, bw=80163KB/s, iops=78 , runt= 10002msec
dm-0: ios=12416/0, merge=0/0, ticks=108252/0, in_queue=108252,
util=98.82%, aggrios=12528/0, aggrmerge=0/0, aggrticks=0/0,
aggrin_queue=0, aggrutil=0.00%
md0: ios=12528/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=522/0, aggrmerge=0/0, aggrticks=4547/0, aggrin_queue=4547,
aggrutil=55.80%
sdb: ios=637/0, merge=0/0, ticks=5492/0, in_queue=5492, util=54.41%
78 IOPS total (1MB blocks) and each disk is doing around 64 IOPS (64KB blocks)
I wanted to rule out the LSI controller as a bottleneck so I installed
a second LSI 9211-8i and connected 10 disks to controller 1 and 10
disks to controller 2 and created a 20 disk raid6 with these disks.
The results (test7 and test8) are pretty much the same as with using 1
controller.
So my questions are:
1) Am I overlooking/not understanding something obvious why I can't
improve performance on the system?
2) Why are the LVM tests performing better as opposed to only using MD(s)?
3) Why is the performance in test3 split between the two PV's and not
aggregated? Bottleneck somewhere, and if so how can I check which is
it?
4) Why are the IO's suddenly split into 4KB blocks when using
striping/raid0? All chunk/block/stripe sizes are 64KB.
5) Any recommendations how to improve performance with this
configuration and not limited at the performance of 10 disks?
Kind regards,
Caspar Smit
[-- Attachment #2: fio-tests.txt --]
[-- Type: text/plain, Size: 23964 bytes --]
test1: (g=0): rw=randread, bs=1M-1M/1M-1M, ioengine=sync, iodepth=1
fio 2.0.4
Starting 1 process
test1: (groupid=0, jobs=1): err= 0: pid=12218
read : io=518144KB, bw=51726KB/s, iops=50 , runt= 10017msec
clat (usec): min=694 , max=268167 , avg=19790.64, stdev=13161.97
lat (usec): min=694 , max=268167 , avg=19790.91, stdev=13161.97
clat percentiles (usec):
| 1.00th=[ 732], 5.00th=[11328], 10.00th=[14784], 20.00th=[16512],
| 30.00th=[17536], 40.00th=[18560], 50.00th=[19328], 60.00th=[20096],
| 70.00th=[21120], 80.00th=[22144], 90.00th=[23936], 95.00th=[25216],
| 99.00th=[56064], 99.50th=[69120], 99.90th=[268288]
bw (KB/s) : min=22304, max=66316, per=99.84%, avg=51642.68, stdev=7977.53
lat (usec) : 750=2.37%, 1000=1.78%
lat (msec) : 2=0.20%, 20=54.15%, 50=40.12%, 100=1.19%, 500=0.20%
cpu : usr=0.00%, sys=0.72%, ctx=514, majf=0, minf=284
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=506/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=518144KB, aggrb=51726KB/s, minb=52967KB/s, maxb=52967KB/s, mint=10017msec, maxt=10017msec
Disk stats (read/write):
md0: ios=8000/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=674/0, aggrmerge=0/0, aggrticks=10183/0, aggrin_queue=10183, aggrutil=77.97%
sdc: ios=830/0, merge=0/0, ticks=12584/0, in_queue=12584, util=63.15%
sdd: ios=818/0, merge=0/0, ticks=12480/0, in_queue=12480, util=75.60%
sde: ios=806/0, merge=0/0, ticks=13000/0, in_queue=13000, util=65.24%
sdf: ios=816/0, merge=0/0, ticks=12228/0, in_queue=12228, util=74.93%
sdg: ios=826/0, merge=0/0, ticks=12412/0, in_queue=12412, util=62.04%
sdh: ios=812/0, merge=0/0, ticks=12092/0, in_queue=12092, util=75.72%
sdi: ios=798/0, merge=0/0, ticks=11916/0, in_queue=11916, util=59.83%
sdj: ios=793/0, merge=0/0, ticks=12160/0, in_queue=12160, util=77.97%
sdk: ios=788/0, merge=0/0, ticks=11140/0, in_queue=11140, util=55.88%
sdaf: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdag: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdb: ios=809/0, merge=0/0, ticks=12188/0, in_queue=12188, util=74.93%
----
test2: (g=0): rw=randread, bs=1M-1M/1M-1M, ioengine=sync, iodepth=1
fio 2.0.4
Starting 1 process
test2: (groupid=0, jobs=1): err= 0: pid=15396
read : io=705536KB, bw=70476KB/s, iops=68 , runt= 10011msec
clat (msec): min=9 , max=264 , avg=14.52, stdev=12.99
lat (msec): min=9 , max=264 , avg=14.52, stdev=12.99
clat percentiles (msec):
| 1.00th=[ 11], 5.00th=[ 12], 10.00th=[ 12], 20.00th=[ 13],
| 30.00th=[ 14], 40.00th=[ 14], 50.00th=[ 14], 60.00th=[ 15],
| 70.00th=[ 15], 80.00th=[ 16], 90.00th=[ 16], 95.00th=[ 17],
| 99.00th=[ 20], 99.50th=[ 24], 99.90th=[ 265]
bw (KB/s) : min= 4055, max=75776, per=100.00%, avg=70507.00, stdev=16129.06
lat (msec) : 10=0.58%, 20=98.55%, 50=0.58%, 250=0.15%, 500=0.15%
cpu : usr=0.00%, sys=1.28%, ctx=708, majf=0, minf=284
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=689/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=705536KB, aggrb=70476KB/s, minb=72167KB/s, maxb=72167KB/s, mint=10011msec, maxt=10011msec
Disk stats (read/write):
dm-0: ios=10880/0, merge=0/0, ticks=114720/0, in_queue=114784, util=98.86%, aggrios=11024/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
md0: ios=11024/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=918/0, aggrmerge=0/0, aggrticks=9684/0, aggrin_queue=9684, aggrutil=74.05%
sdc: ios=1106/0, merge=0/0, ticks=11668/0, in_queue=11668, util=72.07%
sdd: ios=1100/0, merge=0/0, ticks=11700/0, in_queue=11700, util=58.90%
sde: ios=1105/0, merge=0/0, ticks=12012/0, in_queue=12012, util=74.05%
sdf: ios=1110/0, merge=0/0, ticks=11872/0, in_queue=11872, util=59.89%
sdg: ios=1119/0, merge=0/0, ticks=11256/0, in_queue=11256, util=70.29%
sdh: ios=1128/0, merge=0/0, ticks=11764/0, in_queue=11764, util=59.18%
sdi: ios=1095/0, merge=0/0, ticks=11552/0, in_queue=11552, util=73.42%
sdj: ios=1062/0, merge=0/0, ticks=11372/0, in_queue=11372, util=57.24%
sdk: ios=1087/0, merge=0/0, ticks=11344/0, in_queue=11344, util=71.08%
sdaf: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdag: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdb: ios=1112/0, merge=0/0, ticks=11676/0, in_queue=11676, util=58.58%
----
test3: (g=0): rw=randread, bs=1M-1M/1M-1M, ioengine=sync, iodepth=1
fio 2.0.4
Starting 1 process
test3: (groupid=0, jobs=1): err= 0: pid=19157
read : io=769024KB, bw=76849KB/s, iops=75 , runt= 10007msec
clat (msec): min=9 , max=22 , avg=13.32, stdev= 1.60
lat (msec): min=9 , max=22 , avg=13.32, stdev= 1.60
clat percentiles (usec):
| 1.00th=[10816], 5.00th=[11328], 10.00th=[11712], 20.00th=[12224],
| 30.00th=[12608], 40.00th=[12864], 50.00th=[13120], 60.00th=[13376],
| 70.00th=[13760], 80.00th=[14016], 90.00th=[14656], 95.00th=[15296],
| 99.00th=[20608], 99.50th=[21120], 99.90th=[22400]
bw (KB/s) : min=72000, max=79395, per=100.00%, avg=76960.79, stdev=1873.61
lat (msec) : 10=0.13%, 20=98.27%, 50=1.60%
cpu : usr=0.00%, sys=3.48%, ctx=811, majf=0, minf=284
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=751/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=769024KB, aggrb=76848KB/s, minb=78692KB/s, maxb=78692KB/s, mint=10007msec, maxt=10007msec
Disk stats (read/write):
dm-0: ios=190464/0, merge=0/0, ticks=1695940/0, in_queue=1695940, util=98.03%, aggrios=96128/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
md0: ios=96128/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=502/0, aggrmerge=7508/0, aggrticks=4517/0, aggrin_queue=4517, aggrutil=55.56%
sdc: ios=583/0, merge=8745/0, ticks=5100/0, in_queue=5100, util=50.53%
sdd: ios=604/0, merge=8996/0, ticks=5408/0, in_queue=5408, util=53.34%
sde: ios=607/0, merge=9073/0, ticks=5500/0, in_queue=5500, util=54.37%
sdf: ios=595/0, merge=8893/0, ticks=5276/0, in_queue=5276, util=52.15%
sdg: ios=609/0, merge=9087/0, ticks=5532/0, in_queue=5532, util=54.61%
sdh: ios=621/0, merge=9299/0, ticks=5612/0, in_queue=5612, util=55.56%
sdi: ios=619/0, merge=9205/0, ticks=5580/0, in_queue=5580, util=55.01%
sdj: ios=594/0, merge=8878/0, ticks=5464/0, in_queue=5464, util=53.98%
sdk: ios=597/0, merge=8939/0, ticks=5316/0, in_queue=5316, util=52.60%
sdaf: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdag: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdb: ios=601/0, merge=8983/0, ticks=5420/0, in_queue=5420, util=53.55%
md1: ios=96128/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=503/0, aggrmerge=7507/0, aggrticks=4469/0, aggrin_queue=4469, aggrutil=55.77%
sdm: ios=589/0, merge=8739/0, ticks=5320/0, in_queue=5320, util=52.40%
sdn: ios=601/0, merge=8999/0, ticks=5200/0, in_queue=5200, util=51.41%
sdo: ios=610/0, merge=9070/0, ticks=5440/0, in_queue=5440, util=53.63%
sdp: ios=593/0, merge=8895/0, ticks=5284/0, in_queue=5284, util=52.36%
sdq: ios=610/0, merge=9086/0, ticks=5320/0, in_queue=5320, util=52.40%
sdr: ios=623/0, merge=9297/0, ticks=5648/0, in_queue=5648, util=55.77%
sds: ios=618/0, merge=9206/0, ticks=5592/0, in_queue=5592, util=55.21%
sdt: ios=594/0, merge=8878/0, ticks=5204/0, in_queue=5204, util=51.45%
sdu: ios=600/0, merge=8936/0, ticks=5324/0, in_queue=5324, util=52.52%
sdah: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdai: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdl: ios=603/0, merge=8981/0, ticks=5296/0, in_queue=5296, util=52.16%
----
test4: (g=0): rw=randread, bs=1M-1M/1M-1M, ioengine=sync, iodepth=1
fio 2.0.4
Starting 1 process
test4: (groupid=0, jobs=1): err= 0: pid=23762
read : io=462848KB, bw=46257KB/s, iops=45 , runt= 10006msec
clat (msec): min=11 , max=267 , avg=22.13, stdev=19.49
lat (msec): min=11 , max=267 , avg=22.13, stdev=19.49
clat percentiles (msec):
| 1.00th=[ 14], 5.00th=[ 16], 10.00th=[ 17], 20.00th=[ 19],
| 30.00th=[ 20], 40.00th=[ 20], 50.00th=[ 21], 60.00th=[ 22],
| 70.00th=[ 23], 80.00th=[ 23], 90.00th=[ 25], 95.00th=[ 25],
| 99.00th=[ 34], 99.50th=[ 245], 99.90th=[ 269]
bw (KB/s) : min= 3923, max=54748, per=100.00%, avg=46260.32, stdev=12218.61
lat (msec) : 20=42.92%, 50=56.19%, 100=0.22%, 250=0.22%, 500=0.44%
cpu : usr=0.08%, sys=1.64%, ctx=470, majf=0, minf=284
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=452/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=462848KB, aggrb=46257KB/s, minb=47367KB/s, maxb=47367KB/s, mint=10006msec, maxt=10006msec
Disk stats (read/write):
md6: ios=114432/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=57856/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
md0: ios=57856/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=302/0, aggrmerge=4519/0, aggrticks=4782/0, aggrin_queue=4782, aggrutil=59.46%
sdc: ios=360/0, merge=5368/0, ticks=5676/0, in_queue=5676, util=55.89%
sdd: ios=365/0, merge=5427/0, ticks=5988/0, in_queue=5988, util=58.98%
sde: ios=369/0, merge=5535/0, ticks=6000/0, in_queue=6000, util=59.46%
sdf: ios=365/0, merge=5475/0, ticks=5628/0, in_queue=5628, util=55.77%
sdg: ios=361/0, merge=5383/0, ticks=5548/0, in_queue=5548, util=54.74%
sdh: ios=357/0, merge=5323/0, ticks=5592/0, in_queue=5592, util=55.18%
sdi: ios=354/0, merge=5294/0, ticks=5736/0, in_queue=5736, util=56.69%
sdj: ios=365/0, merge=5475/0, ticks=5872/0, in_queue=5872, util=58.20%
sdk: ios=369/0, merge=5535/0, ticks=5736/0, in_queue=5736, util=56.85%
sdaf: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdag: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdb: ios=362/0, merge=5414/0, ticks=5616/0, in_queue=5616, util=55.42%
md1: ios=57856/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=301/0, aggrmerge=4519/0, aggrticks=4770/0, aggrin_queue=4770, aggrutil=59.15%
sdm: ios=359/0, merge=5369/0, ticks=5708/0, in_queue=5708, util=56.45%
sdn: ios=362/0, merge=5430/0, ticks=5668/0, in_queue=5668, util=56.17%
sdo: ios=371/0, merge=5533/0, ticks=6000/0, in_queue=6000, util=59.15%
sdp: ios=366/0, merge=5474/0, ticks=5900/0, in_queue=5900, util=58.35%
sdq: ios=359/0, merge=5385/0, ticks=5804/0, in_queue=5804, util=57.52%
sdr: ios=355/0, merge=5325/0, ticks=5492/0, in_queue=5492, util=54.43%
sds: ios=353/0, merge=5295/0, ticks=5436/0, in_queue=5436, util=53.88%
sdt: ios=365/0, merge=5475/0, ticks=5620/0, in_queue=5620, util=55.70%
sdu: ios=370/0, merge=5534/0, ticks=5700/0, in_queue=5700, util=56.29%
sdah: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdai: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdl: ios=362/0, merge=5414/0, ticks=5920/0, in_queue=5920, util=58.55%
----
test5: (g=0): rw=randread, bs=1M-1M/1M-1M, ioengine=sync, iodepth=1
fio 2.0.4
Starting 1 process
test5: (groupid=0, jobs=1): err= 0: pid=32210
read : io=504832KB, bw=50453KB/s, iops=49 , runt= 10006msec
clat (msec): min=11 , max=60 , avg=20.29, stdev= 3.76
lat (msec): min=11 , max=60 , avg=20.29, stdev= 3.76
clat percentiles (usec):
| 1.00th=[13248], 5.00th=[15680], 10.00th=[16768], 20.00th=[18048],
| 30.00th=[18816], 40.00th=[19584], 50.00th=[20096], 60.00th=[20864],
| 70.00th=[21632], 80.00th=[22400], 90.00th=[23424], 95.00th=[24704],
| 99.00th=[25984], 99.50th=[49920], 99.90th=[60160]
bw (KB/s) : min=39922, max=54425, per=99.82%, avg=50360.74, stdev=3005.20
lat (msec) : 20=48.68%, 50=50.71%, 100=0.61%
cpu : usr=0.00%, sys=0.76%, ctx=499, majf=0, minf=284
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=493/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=504832KB, aggrb=50452KB/s, minb=51663KB/s, maxb=51663KB/s, mint=10006msec, maxt=10006msec
Disk stats (read/write):
md0: ios=7808/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=328/0, aggrmerge=0/0, aggrticks=4925/0, aggrin_queue=4925, aggrutil=61.63%
sdb: ios=395/0, merge=0/0, ticks=5652/0, in_queue=5652, util=56.00%
sdc: ios=404/0, merge=0/0, ticks=5932/0, in_queue=5932, util=58.78%
sdd: ios=390/0, merge=0/0, ticks=5856/0, in_queue=5856, util=58.03%
sde: ios=397/0, merge=0/0, ticks=6180/0, in_queue=6180, util=61.24%
sdf: ios=396/0, merge=0/0, ticks=5940/0, in_queue=5940, util=58.86%
sdg: ios=392/0, merge=0/0, ticks=5772/0, in_queue=5772, util=57.19%
sdh: ios=385/0, merge=0/0, ticks=5600/0, in_queue=5600, util=55.49%
sdi: ios=377/0, merge=0/0, ticks=5800/0, in_queue=5800, util=57.47%
sdj: ios=378/0, merge=0/0, ticks=5768/0, in_queue=5768, util=57.15%
sdk: ios=394/0, merge=0/0, ticks=5820/0, in_queue=5820, util=57.67%
sdl: ios=398/0, merge=0/0, ticks=6108/0, in_queue=6108, util=60.52%
sdm: ios=390/0, merge=0/0, ticks=5856/0, in_queue=5856, util=58.03%
sdn: ios=402/0, merge=0/0, ticks=6000/0, in_queue=6000, util=59.45%
sdo: ios=400/0, merge=0/0, ticks=6220/0, in_queue=6220, util=61.63%
sdp: ios=398/0, merge=0/0, ticks=6068/0, in_queue=6068, util=60.13%
sdq: ios=401/0, merge=0/0, ticks=6200/0, in_queue=6200, util=61.44%
sdr: ios=401/0, merge=0/0, ticks=5832/0, in_queue=5832, util=57.79%
sds: ios=392/0, merge=0/0, ticks=5820/0, in_queue=5820, util=57.68%
sdt: ios=401/0, merge=0/0, ticks=5908/0, in_queue=5908, util=58.55%
sdu: ios=397/0, merge=0/0, ticks=5888/0, in_queue=5888, util=58.35%
sdaf: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdag: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdah: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdai: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
----
test6: (g=0): rw=randread, bs=1M-1M/1M-1M, ioengine=sync, iodepth=1
fio 2.0.4
Starting 1 process
test6: (groupid=0, jobs=1): err= 0: pid=1850
read : io=801792KB, bw=80163KB/s, iops=78 , runt= 10002msec
clat (msec): min=9 , max=60 , avg=12.77, stdev= 2.77
lat (msec): min=9 , max=60 , avg=12.77, stdev= 2.77
clat percentiles (usec):
| 1.00th=[10048], 5.00th=[10944], 10.00th=[11328], 20.00th=[11712],
| 30.00th=[12096], 40.00th=[12352], 50.00th=[12608], 60.00th=[12864],
| 70.00th=[13120], 80.00th=[13376], 90.00th=[13888], 95.00th=[14144],
| 99.00th=[19840], 99.50th=[28032], 99.90th=[60672]
bw (KB/s) : min=64631, max=84329, per=100.00%, avg=80169.58, stdev=4137.01
lat (msec) : 10=0.89%, 20=98.21%, 50=0.64%, 100=0.26%
cpu : usr=0.04%, sys=1.36%, ctx=802, majf=0, minf=284
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=783/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=801792KB, aggrb=80163KB/s, minb=82087KB/s, maxb=82087KB/s, mint=10002msec, maxt=10002msec
Disk stats (read/write):
dm-0: ios=12416/0, merge=0/0, ticks=108252/0, in_queue=108252, util=98.82%, aggrios=12528/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
md0: ios=12528/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=522/0, aggrmerge=0/0, aggrticks=4547/0, aggrin_queue=4547, aggrutil=55.80%
sdb: ios=637/0, merge=0/0, ticks=5492/0, in_queue=5492, util=54.41%
sdc: ios=634/0, merge=0/0, ticks=5320/0, in_queue=5320, util=52.71%
sdd: ios=631/0, merge=0/0, ticks=5588/0, in_queue=5588, util=55.37%
sde: ios=629/0, merge=0/0, ticks=5584/0, in_queue=5584, util=55.33%
sdf: ios=620/0, merge=0/0, ticks=5552/0, in_queue=5552, util=55.01%
sdg: ios=610/0, merge=0/0, ticks=5420/0, in_queue=5420, util=53.70%
sdh: ios=631/0, merge=0/0, ticks=5468/0, in_queue=5468, util=54.18%
sdi: ios=619/0, merge=0/0, ticks=5416/0, in_queue=5416, util=53.66%
sdj: ios=630/0, merge=0/0, ticks=5424/0, in_queue=5424, util=53.74%
sdk: ios=630/0, merge=0/0, ticks=5564/0, in_queue=5564, util=55.13%
sdl: ios=607/0, merge=0/0, ticks=5200/0, in_queue=5200, util=51.52%
sdm: ios=637/0, merge=0/0, ticks=5544/0, in_queue=5544, util=54.93%
sdn: ios=619/0, merge=0/0, ticks=5416/0, in_queue=5416, util=53.66%
sdo: ios=640/0, merge=0/0, ticks=5568/0, in_queue=5568, util=55.17%
sdp: ios=636/0, merge=0/0, ticks=5632/0, in_queue=5632, util=55.80%
sdq: ios=615/0, merge=0/0, ticks=5376/0, in_queue=5376, util=53.26%
sdr: ios=622/0, merge=0/0, ticks=5284/0, in_queue=5284, util=52.35%
sds: ios=618/0, merge=0/0, ticks=5428/0, in_queue=5428, util=53.78%
sdt: ios=631/0, merge=0/0, ticks=5528/0, in_queue=5528, util=54.78%
sdu: ios=632/0, merge=0/0, ticks=5340/0, in_queue=5340, util=52.91%
sdaf: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdag: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdah: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdai: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
----
test7: (g=0): rw=randread, bs=1M-1M/1M-1M, ioengine=sync, iodepth=1
fio 2.0.4
Starting 1 process
test7: (groupid=0, jobs=1): err= 0: pid=13783
read : io=505856KB, bw=50545KB/s, iops=49 , runt= 10008msec
clat (msec): min=11 , max=31 , avg=20.25, stdev= 2.89
lat (msec): min=11 , max=31 , avg=20.25, stdev= 2.89
clat percentiles (usec):
| 1.00th=[13760], 5.00th=[15552], 10.00th=[16768], 20.00th=[17792],
| 30.00th=[18560], 40.00th=[19328], 50.00th=[20096], 60.00th=[20864],
| 70.00th=[21888], 80.00th=[22912], 90.00th=[23936], 95.00th=[24960],
| 99.00th=[26240], 99.50th=[27776], 99.90th=[31616]
bw (KB/s) : min=48282, max=52965, per=99.93%, avg=50507.63, stdev=1511.43
lat (msec) : 20=47.37%, 50=52.63%
cpu : usr=0.00%, sys=0.72%, ctx=501, majf=0, minf=284
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=494/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=505856KB, aggrb=50545KB/s, minb=51758KB/s, maxb=51758KB/s, mint=10008msec, maxt=10008msec
Disk stats (read/write):
md0: ios=7824/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=395/0, aggrmerge=0/0, aggrticks=5965/0, aggrin_queue=5965, aggrutil=61.25%
sdb: ios=396/0, merge=0/0, ticks=5732/0, in_queue=5732, util=56.77%
sdba: ios=405/0, merge=0/0, ticks=6124/0, in_queue=6124, util=60.65%
sdbb: ios=391/0, merge=0/0, ticks=6016/0, in_queue=6016, util=59.58%
sdbc: ios=398/0, merge=0/0, ticks=6184/0, in_queue=6184, util=61.25%
sdbd: ios=397/0, merge=0/0, ticks=5928/0, in_queue=5928, util=58.72%
sdbg: ios=393/0, merge=0/0, ticks=6084/0, in_queue=6084, util=60.26%
sdbh: ios=386/0, merge=0/0, ticks=5884/0, in_queue=5884, util=58.28%
sdbi: ios=378/0, merge=0/0, ticks=5856/0, in_queue=5856, util=58.00%
sdbl: ios=379/0, merge=0/0, ticks=5948/0, in_queue=5948, util=58.91%
sdbr: ios=394/0, merge=0/0, ticks=5988/0, in_queue=5988, util=59.31%
sdbs: ios=398/0, merge=0/0, ticks=6072/0, in_queue=6072, util=60.14%
sdc: ios=390/0, merge=0/0, ticks=5712/0, in_queue=5712, util=56.58%
sdd: ios=402/0, merge=0/0, ticks=5912/0, in_queue=5912, util=58.56%
sde: ios=401/0, merge=0/0, ticks=6140/0, in_queue=6140, util=60.82%
sdf: ios=399/0, merge=0/0, ticks=5844/0, in_queue=5844, util=57.88%
sdg: ios=402/0, merge=0/0, ticks=6032/0, in_queue=6032, util=59.75%
sdh: ios=402/0, merge=0/0, ticks=5976/0, in_queue=5976, util=59.19%
sdi: ios=393/0, merge=0/0, ticks=5952/0, in_queue=5952, util=58.95%
sdj: ios=402/0, merge=0/0, ticks=6088/0, in_queue=6088, util=60.30%
sdk: ios=398/0, merge=0/0, ticks=5828/0, in_queue=5828, util=57.73%
----
test8: (g=0): rw=randread, bs=1M-1M/1M-1M, ioengine=sync, iodepth=1
fio 2.0.4
Starting 1 process
test8: (groupid=0, jobs=1): err= 0: pid=14153
read : io=707584KB, bw=70751KB/s, iops=69 , runt= 10001msec
clat (msec): min=8 , max=607 , avg=14.47, stdev=31.84
lat (msec): min=8 , max=607 , avg=14.47, stdev=31.84
clat percentiles (msec):
| 1.00th=[ 11], 5.00th=[ 12], 10.00th=[ 12], 20.00th=[ 12],
| 30.00th=[ 13], 40.00th=[ 13], 50.00th=[ 13], 60.00th=[ 13],
| 70.00th=[ 14], 80.00th=[ 14], 90.00th=[ 15], 95.00th=[ 15],
| 99.00th=[ 20], 99.50th=[ 22], 99.90th=[ 611]
bw (KB/s) : min= 1686, max=82483, per=100.00%, avg=71972.26, stdev=24807.51
lat (msec) : 10=0.29%, 20=98.84%, 50=0.58%, 750=0.29%
cpu : usr=0.04%, sys=1.20%, ctx=711, majf=0, minf=284
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=691/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=707584KB, aggrb=70751KB/s, minb=72449KB/s, maxb=72449KB/s, mint=10001msec, maxt=10001msec
Disk stats (read/write):
dm-0: ios=10944/0, merge=0/0, ticks=101672/0, in_queue=101712, util=98.78%, aggrios=11056/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
md0: ios=11056/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=552/0, aggrmerge=0/0, aggrticks=5137/0, aggrin_queue=5137, aggrutil=54.71%
sdb: ios=565/0, merge=0/0, ticks=4988/0, in_queue=4988, util=49.41%
sdba: ios=560/0, merge=0/0, ticks=5004/0, in_queue=5004, util=49.56%
sdbb: ios=554/0, merge=0/0, ticks=5488/0, in_queue=5488, util=54.36%
sdbc: ios=552/0, merge=0/0, ticks=5524/0, in_queue=5524, util=54.71%
sdbd: ios=545/0, merge=0/0, ticks=4744/0, in_queue=4744, util=46.99%
sdbg: ios=535/0, merge=0/0, ticks=5376/0, in_queue=5376, util=53.25%
sdbh: ios=558/0, merge=0/0, ticks=4960/0, in_queue=4960, util=49.13%
sdbi: ios=542/0, merge=0/0, ticks=4764/0, in_queue=4764, util=47.19%
sdbl: ios=556/0, merge=0/0, ticks=5480/0, in_queue=5480, util=54.28%
sdbr: ios=555/0, merge=0/0, ticks=5452/0, in_queue=5452, util=54.01%
sdbs: ios=540/0, merge=0/0, ticks=5408/0, in_queue=5408, util=53.57%
sdc: ios=569/0, merge=0/0, ticks=5340/0, in_queue=5340, util=52.90%
sdd: ios=549/0, merge=0/0, ticks=4888/0, in_queue=4888, util=48.42%
sde: ios=568/0, merge=0/0, ticks=5016/0, in_queue=5016, util=49.69%
sdf: ios=560/0, merge=0/0, ticks=5032/0, in_queue=5032, util=49.85%
sdg: ios=544/0, merge=0/0, ticks=4744/0, in_queue=4744, util=46.99%
sdh: ios=547/0, merge=0/0, ticks=4860/0, in_queue=4860, util=48.14%
sdi: ios=546/0, merge=0/0, ticks=5320/0, in_queue=5320, util=52.70%
sdj: ios=554/0, merge=0/0, ticks=5452/0, in_queue=5452, util=54.01%
sdk: ios=557/0, merge=0/0, ticks=4912/0, in_queue=4912, util=48.66%
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Striping does not increase performance.
2012-03-12 12:34 Striping does not increase performance Caspar Smit
@ 2012-03-12 12:57 ` Erwan MAS
2012-03-12 13:02 ` Caspar Smit
2012-03-12 14:20 ` David Brown
1 sibling, 1 reply; 11+ messages in thread
From: Erwan MAS @ 2012-03-12 12:57 UTC (permalink / raw)
To: Caspar Smit; +Cc: linux-raid, fio
On Mon, Mar 12, 2012 at 01:34:08PM +0100, Caspar Smit wrote:
> Hi all,
>
> I don't know exactly which mailinglists to use for this one so I hope
> i used the right ones.
>
> I did some performance testing on a new system and found out some
> things I couldn't explain or didn't expect.
> At the end are some questions I hope to get answered to explain the
> tings i'm seeing in the test.
>
> First of all a description of my setup.
>
> The server is a 36 bay 3,5" supermicro chassis filled with 36x 2TB
> SATA 7200 RPM disks.
> [../..]
Please check alignement of each level of stockage .
Because this disk present to controller a block size of 512 bytes , but internal
use a 4k block size .
You must check that alignement is correct for :
disk partionnemnt
md device
pv lvm
--
____________________________________________________________
/ Erwan MAS /\
| mailto:erwan@mas.nom.fr |_/
___|________________________________________________________ |
\___________________________________________________________\__/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Striping does not increase performance.
2012-03-12 12:57 ` Erwan MAS
@ 2012-03-12 13:02 ` Caspar Smit
2012-03-12 13:58 ` Erwan MAS
2012-03-12 14:33 ` Peter Grandi
0 siblings, 2 replies; 11+ messages in thread
From: Caspar Smit @ 2012-03-12 13:02 UTC (permalink / raw)
To: Erwan MAS; +Cc: linux-raid, fio
Hi Erwan,
I do not use partitions on the drives so the whole disk /dev/sdb is
used as md component device, i was in the understanding that if not
using partitions the alignment is correct or am i wrong?
Kind regards,
Caspar Smit
Op 12 maart 2012 13:57 heeft Erwan MAS <erwan@mas.nom.fr> het volgende
geschreven:
> On Mon, Mar 12, 2012 at 01:34:08PM +0100, Caspar Smit wrote:
>> Hi all,
>>
>> I don't know exactly which mailinglists to use for this one so I hope
>> i used the right ones.
>>
>> I did some performance testing on a new system and found out some
>> things I couldn't explain or didn't expect.
>> At the end are some questions I hope to get answered to explain the
>> tings i'm seeing in the test.
>>
>> First of all a description of my setup.
>>
>> The server is a 36 bay 3,5" supermicro chassis filled with 36x 2TB
>> SATA 7200 RPM disks.
>> [../..]
>
> Please check alignement of each level of stockage .
> Because this disk present to controller a block size of 512 bytes , but internal
> use a 4k block size .
>
> You must check that alignement is correct for :
> disk partionnemnt
> md device
> pv lvm
>
> --
> ____________________________________________________________
> / Erwan MAS /\
> | mailto:erwan@mas.nom.fr |_/
> ___|________________________________________________________ |
> \___________________________________________________________\__/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Striping does not increase performance.
2012-03-12 13:02 ` Caspar Smit
@ 2012-03-12 13:58 ` Erwan MAS
2012-03-12 14:34 ` Jiri Horky
2012-03-12 14:33 ` Peter Grandi
1 sibling, 1 reply; 11+ messages in thread
From: Erwan MAS @ 2012-03-12 13:58 UTC (permalink / raw)
To: Caspar Smit; +Cc: linux-raid, fio
On Mon, Mar 12, 2012 at 02:02:36PM +0100, Caspar Smit wrote:
> Hi Erwan,
>
> I do not use partitions on the drives so the whole disk /dev/sdb is
> used as md component device, i was in the understanding that if not
> using partitions the alignment is correct or am i wrong?
>
For me it's correct .
--
____________________________________________________________
/ Erwan MAS /\
| mailto:erwan@mas.nom.fr |_/
___|________________________________________________________ |
\___________________________________________________________\__/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Striping does not increase performance.
2012-03-12 13:58 ` Erwan MAS
@ 2012-03-12 14:34 ` Jiri Horky
2012-03-12 16:23 ` John Robinson
0 siblings, 1 reply; 11+ messages in thread
From: Jiri Horky @ 2012-03-12 14:34 UTC (permalink / raw)
To: Erwan MAS; +Cc: Caspar Smit, linux-raid, fio
[-- Attachment #1: Type: text/plain, Size: 1577 bytes --]
Hi,
as for alignment, please also check md metadata format version (-e
switch of mdadm). The version that ensures alignment is 1.0, which
places the metadata information to the back of the drive. I am not an
expert to LVM but I would suspect that there might be similar
problems/options to check.
I would first verify physical sector size of the drives you use. Please
have a look at the utility at
http://www.farm.particle.cz/blksztester.tar.gz. It randomly directly
writes e.g. 4K blocks to a block device at choosen alignment. Run it
with no arguments to get some help in English (README is in Czech :).
If you get similar speeds for 4k-align and 4k-misalign writes, the disk
use 512b physical sector size. The difference should be around 300% in
case of 4K sectors. You may try to google for sector size according to
product number, but I have seen cases where official vendor websites
were wrong. Please note that you shold not connect the drive under test
through RAID controller with its own logic and/or cache, which may skew
the results, but rather to a stupid motherboard.
I had several headaches with 4k-block drives in the past as well...
Cheers
Jiri Horky
On 03/12/2012 02:58 PM, Erwan MAS wrote:
> On Mon, Mar 12, 2012 at 02:02:36PM +0100, Caspar Smit wrote:
>> Hi Erwan,
>>
>> I do not use partitions on the drives so the whole disk /dev/sdb is
>> used as md component device, i was in the understanding that if not
>> using partitions the alignment is correct or am i wrong?
>>
> For me it's correct .
>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5655 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Striping does not increase performance.
2012-03-12 14:34 ` Jiri Horky
@ 2012-03-12 16:23 ` John Robinson
0 siblings, 0 replies; 11+ messages in thread
From: John Robinson @ 2012-03-12 16:23 UTC (permalink / raw)
To: Jiri Horky; +Cc: Erwan MAS, Caspar Smit, linux-raid, fio
On 12/03/2012 14:34, Jiri Horky wrote:
> Hi,
>
> as for alignment, please also check md metadata format version (-e
> switch of mdadm). The version that ensures alignment is 1.0, which
> places the metadata information to the back of the drive.
As far as I can tell from looking at the source, ever since the 1.x
metadata types were introduced (with mdadm 2.0, in August 2005):
* the data offset on 1.1 and 1.2 arrays has always been 4K-aligned,
* the superblock offset for 1.0 arrays has always been 4K-aligned,
* the bitmap offset for all 1.x arrays has always been 4K-aligned.
As of mdadm 3.1.2 (March 2010), the data offset for new 1.1 and 1.2
arrays is always 1MB (2048 sectors), so they're also 1M-aligned.
Cheers,
John.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Striping does not increase performance.
2012-03-12 13:02 ` Caspar Smit
2012-03-12 13:58 ` Erwan MAS
@ 2012-03-12 14:33 ` Peter Grandi
2012-03-13 11:44 ` Caspar Smit
1 sibling, 1 reply; 11+ messages in thread
From: Peter Grandi @ 2012-03-12 14:33 UTC (permalink / raw)
To: Linux RAID
>>> The server is a 36 bay 3,5" supermicro chassis filled with
>>> 36x 2TB SATA 7200 RPM disks.
[ ... ]
>>> I used a bandwidth random read test I found on the Fusion IO
>>> website. after every test i ran: sync; echo 3 >/proc/sys/vm/drop_caches;
Given that the FIO parameters below specify O_DIRECT and cache
invalidation (which is itself redundant with O_DIRECT), 'sync' or
dropping caches are pointless. Which makes me suspect that you ar
eclear as to what you are measuring, or what you want to measure,
an impression very much reinforced by several other details.
But congratulations on choosing FIO, it is one of the few tools
that can, used advisedly, give somewhat relevant numbers.
>>> Once booted I created 3 raid6 MD devices of 10 disks each
>>> (16TB netto each) with 6 global hotspares in the same
>>> sparegroup. All MD devices have a chunk size of 64KB
>>> fio --name=test1 --ioengine=sync --direct=1 --rw=randread --bs=1m
>>> --runtime=10 --filename=/dev/md0 --iodepth=1 --invalidate=1
>>> read : io=518144KB, bw=51726KB/s, iops=50 , runt= 10017msec
So you have a stripe size of 64KiB*8 => 512KiB. Each random read
takes 1 IOPS to position the heads, and then reads two stripes,
which involves at least one and perhaps two head alignment times
(probably 1/2 of full rotation).
So each random read in figures should cost about 10-15ms cylinder
positioning time plus 1MiB read off 8 disks each capable of
around 90MB/s each (average between inner and outer cylinders)
which is another 2ms plus around 1-2 times 1/2 rotational
latency, and that's probably another 2-3ms, for a total of around
15-20ms per transaction average.
Your results result seems pretty much in line with this, with
small variations.
You have just discovered that single threaded/depth random
transfers on a RAID set are not much faster than on a single
disk. :-)
[ ... ]
>>> For the next test I wanted to see if i could double the
>>> performance by striping an LV over 2 md's (so instead of
>>> using 10 disks/spindles, use 20 disks/spindles)
That's an astonishing expectation. It is hard to imagine for me
why reading that 1MiB twice as fast would give significantly
better "performance" (whatever you mean by that) when you have a
seek+align interval at the same frequency, and that's the
dominant cost.
[ ... ]
>>> fio --name=test3 --ioengine=sync --direct=1 --rw=randread --bs=1m
>>> --runtime=10 --filename=/dev/dm-0 --iodepth=1 --invalidate=1
>>> Now things are getting interesting:
>>> read : io=769024KB, bw=76849KB/s, iops=75 , runt= 10007msec
>>> Now the total IO's in 10 seconds are 16x larger than
>>> before. [ ... ] The IO's per disk seem to be in 64KB blocks
>>> still only now with a large MERGE figure besides it.
That's not terribly interesting. It has second order effects, but
otherwise fairly irrelevant. You are doing O_DIRECT IO on the LV,
but then the DM layer can rearrange things. The disks can do IO
in 4KiB sectors only, and the alternative is between one command
with a count of N or N commands with a count of 1, and that's not
that big a difference. Also because SATA specifies a rather
primitive ability to queue commands.
>>> Each disk now does around 60 IOPS!
Much the same as before in effect.
Please note that when people talk "IOPS" what they really mean is
"fully random IOPS", that is SEEKS. You can get a lot of IOPS
even on hard disks if they are sequential and short. What matters
is numbers of random seeks. Multiplying by N the "IOPS" by doing
transfers in 1/4 the size is insignificant.
[ ... various aother attempts ... ]
>>> 1) Am I overlooking/not understanding something obvious why I
>>> can't improve performance on the system?
What kind of "performance" do you expect? Your tests are almost
entirely dominated by single threaded synchronous seeks, and you
are getting more or less what the hw can deliver for those, with
small variations depending on layering of IO scheduling.
>>> 2) Why are the LVM tests performing better as opposed to
>>> only using MD(s)?
Slightly different scheduling as various layers rearrange the
flow and timing of requests.
>>> 3) Why is the performance in test3 split between the two PV's
>>> and not aggregated? Bottleneck somewhere, and if so how can I
>>> check which is it?
"Doctor, if I hammer a nail through my hand it hurts a lot"
"Don't do it" :-).
>>> 4) Why are the IO's suddenly split into 4KB blocks when using
>>> striping/raid0? All chunk/block/stripe sizes are 64KB.
IO layers can rearrange things as much as they please. Even
O_DIRECT really just means "no page cache", not "do physical IO
one-to-one with logical IO", even if currently under Linux it
usually implies that.
>>> 5) Any recommendations how to improve performance with this
>>> configuration and not limited at the performance of 10 disks?
Again, what does "performance" mean to you? For which workload
profile?
>> Please check alignement of each level of stockage. Because
>> this disk present to controller a block size of 512 bytes ,
>> but internal use a 4k block size.
That matters almost only for *writes*. Unaligned reads cost a lot
less, and on a 128KiB transaction size (two chunks on each disk)
the extra cost (two extra sector reads) should be unimportant.
> [ ... ] not use partitions on the drives so the whole disk
> /dev/sdb is used as md component device, i was in the
> understanding that if not using partitions the alignment is
> correct or am i wrong?
Not necessarily, but usually yes.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Striping does not increase performance.
2012-03-12 14:33 ` Peter Grandi
@ 2012-03-13 11:44 ` Caspar Smit
0 siblings, 0 replies; 11+ messages in thread
From: Caspar Smit @ 2012-03-13 11:44 UTC (permalink / raw)
To: Linux RAID
Peter,
Thanks you for your comments, I'm obviously pretty green in this field.
I'll try and decipher your remarks and hope to learn alot from them.
Caspar
Op 12 maart 2012 15:33 heeft Peter Grandi <pg@lxra2.to.sabi.co.uk> het
volgende geschreven:
>>>> The server is a 36 bay 3,5" supermicro chassis filled with
>>>> 36x 2TB SATA 7200 RPM disks.
> [ ... ]
>
>>>> I used a bandwidth random read test I found on the Fusion IO
>>>> website. after every test i ran: sync; echo 3 >/proc/sys/vm/drop_caches;
>
> Given that the FIO parameters below specify O_DIRECT and cache
> invalidation (which is itself redundant with O_DIRECT), 'sync' or
> dropping caches are pointless. Which makes me suspect that you ar
> eclear as to what you are measuring, or what you want to measure,
> an impression very much reinforced by several other details.
>
> But congratulations on choosing FIO, it is one of the few tools
> that can, used advisedly, give somewhat relevant numbers.
>
>>>> Once booted I created 3 raid6 MD devices of 10 disks each
>>>> (16TB netto each) with 6 global hotspares in the same
>>>> sparegroup. All MD devices have a chunk size of 64KB
>>>> fio --name=test1 --ioengine=sync --direct=1 --rw=randread --bs=1m
>>>> --runtime=10 --filename=/dev/md0 --iodepth=1 --invalidate=1
>>>> read : io=518144KB, bw=51726KB/s, iops=50 , runt= 10017msec
>
> So you have a stripe size of 64KiB*8 => 512KiB. Each random read
> takes 1 IOPS to position the heads, and then reads two stripes,
> which involves at least one and perhaps two head alignment times
> (probably 1/2 of full rotation).
>
> So each random read in figures should cost about 10-15ms cylinder
> positioning time plus 1MiB read off 8 disks each capable of
> around 90MB/s each (average between inner and outer cylinders)
> which is another 2ms plus around 1-2 times 1/2 rotational
> latency, and that's probably another 2-3ms, for a total of around
> 15-20ms per transaction average.
>
> Your results result seems pretty much in line with this, with
> small variations.
>
> You have just discovered that single threaded/depth random
> transfers on a RAID set are not much faster than on a single
> disk. :-)
>
> [ ... ]
>
>>>> For the next test I wanted to see if i could double the
>>>> performance by striping an LV over 2 md's (so instead of
>>>> using 10 disks/spindles, use 20 disks/spindles)
>
> That's an astonishing expectation. It is hard to imagine for me
> why reading that 1MiB twice as fast would give significantly
> better "performance" (whatever you mean by that) when you have a
> seek+align interval at the same frequency, and that's the
> dominant cost.
>
> [ ... ]
>
>>>> fio --name=test3 --ioengine=sync --direct=1 --rw=randread --bs=1m
>>>> --runtime=10 --filename=/dev/dm-0 --iodepth=1 --invalidate=1
>>>> Now things are getting interesting:
>>>> read : io=769024KB, bw=76849KB/s, iops=75 , runt= 10007msec
>>>> Now the total IO's in 10 seconds are 16x larger than
>>>> before. [ ... ] The IO's per disk seem to be in 64KB blocks
>>>> still only now with a large MERGE figure besides it.
>
> That's not terribly interesting. It has second order effects, but
> otherwise fairly irrelevant. You are doing O_DIRECT IO on the LV,
> but then the DM layer can rearrange things. The disks can do IO
> in 4KiB sectors only, and the alternative is between one command
> with a count of N or N commands with a count of 1, and that's not
> that big a difference. Also because SATA specifies a rather
> primitive ability to queue commands.
>
>>>> Each disk now does around 60 IOPS!
>
> Much the same as before in effect.
>
> Please note that when people talk "IOPS" what they really mean is
> "fully random IOPS", that is SEEKS. You can get a lot of IOPS
> even on hard disks if they are sequential and short. What matters
> is numbers of random seeks. Multiplying by N the "IOPS" by doing
> transfers in 1/4 the size is insignificant.
>
> [ ... various aother attempts ... ]
>
>>>> 1) Am I overlooking/not understanding something obvious why I
>>>> can't improve performance on the system?
>
> What kind of "performance" do you expect? Your tests are almost
> entirely dominated by single threaded synchronous seeks, and you
> are getting more or less what the hw can deliver for those, with
> small variations depending on layering of IO scheduling.
>
>>>> 2) Why are the LVM tests performing better as opposed to
>>>> only using MD(s)?
>
> Slightly different scheduling as various layers rearrange the
> flow and timing of requests.
>
>>>> 3) Why is the performance in test3 split between the two PV's
>>>> and not aggregated? Bottleneck somewhere, and if so how can I
>>>> check which is it?
>
> "Doctor, if I hammer a nail through my hand it hurts a lot"
> "Don't do it" :-).
>
>>>> 4) Why are the IO's suddenly split into 4KB blocks when using
>>>> striping/raid0? All chunk/block/stripe sizes are 64KB.
>
> IO layers can rearrange things as much as they please. Even
> O_DIRECT really just means "no page cache", not "do physical IO
> one-to-one with logical IO", even if currently under Linux it
> usually implies that.
>
>>>> 5) Any recommendations how to improve performance with this
>>>> configuration and not limited at the performance of 10 disks?
>
> Again, what does "performance" mean to you? For which workload
> profile?
>
>>> Please check alignement of each level of stockage. Because
>>> this disk present to controller a block size of 512 bytes ,
>>> but internal use a 4k block size.
>
> That matters almost only for *writes*. Unaligned reads cost a lot
> less, and on a 128KiB transaction size (two chunks on each disk)
> the extra cost (two extra sector reads) should be unimportant.
>
>> [ ... ] not use partitions on the drives so the whole disk
>> /dev/sdb is used as md component device, i was in the
>> understanding that if not using partitions the alignment is
>> correct or am i wrong?
>
> Not necessarily, but usually yes.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Striping does not increase performance.
2012-03-12 12:34 Striping does not increase performance Caspar Smit
2012-03-12 12:57 ` Erwan MAS
@ 2012-03-12 14:20 ` David Brown
2012-03-13 11:55 ` Caspar Smit
1 sibling, 1 reply; 11+ messages in thread
From: David Brown @ 2012-03-12 14:20 UTC (permalink / raw)
To: linux-raid
On 12/03/2012 13:34, Caspar Smit wrote:
> Hi all,
>
> I don't know exactly which mailinglists to use for this one so I hope
> i used the right ones.
>
> I did some performance testing on a new system and found out some
> things I couldn't explain or didn't expect.
> At the end are some questions I hope to get answered to explain the
> tings i'm seeing in the test.
>
>
> For the next test I wanted to see if i could double the performance by
> striping an LV over 2 md's (so instead of using 10 disks/spindles, use
> 20 disks/spindles)
>
> So i added md1 to the VG as PV.
>
> Created a fresh LV striped across the two PV's using a 64KB stripe
> size and ran the test again.
>
>
> Now the total IO's in 10 seconds are 16x larger than before. 190464 /
> 10 = 19046,4 / 16 = 1190,4 /16 = the reported 75 IOPS above.
> So the 64KB blocks seem to be split into 4KB blocks (64 / 16 = 4)
> which results in a way larger total IO's.
> The IO's per disk seem to be in 64KB blocks still only now with a
> large MERGE figure besides it. (Now 4KB blocks are merged into 64KB
> blocks?)
>
LVM will stripe the data between the two md's with a default stripe size
of 4K - thus the first 4K will go to md0, the second to md1, etc. This
is obviously terribly inefficient. For 8+2 raid6 with 64KB chunks, you
want a stripe size of 8x64K = 512KB when you create the logical volume.
> The performance does not double but stays the same as with 1 MD set
> only the total IO's are spread among the MD's. Each disk now does
> around 60 IOPS!
>
> I still wanted to see if I could double the performance and thought it
> might have something to do with LVM striping so i ditched LVM and
> created a RAID0 (md6) over md0 and md1 with a chunk size of 64KB
> again.
>
Similarly here, you want your chunk sizes to fit a stripe on the raid
devices, not to fit the underlying devices. So try with a chunk size of
512KB (or higher).
mvh.,
David
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Striping does not increase performance.
2012-03-12 14:20 ` David Brown
@ 2012-03-13 11:55 ` Caspar Smit
2012-03-13 14:12 ` David Brown
0 siblings, 1 reply; 11+ messages in thread
From: Caspar Smit @ 2012-03-13 11:55 UTC (permalink / raw)
To: linux-raid
Op 12 maart 2012 15:20 heeft David Brown <david@westcontrol.com> het
volgende geschreven:
> On 12/03/2012 13:34, Caspar Smit wrote:
>>
>> Hi all,
>>
>> I don't know exactly which mailinglists to use for this one so I hope
>> i used the right ones.
>>
>> I did some performance testing on a new system and found out some
>> things I couldn't explain or didn't expect.
>> At the end are some questions I hope to get answered to explain the
>> tings i'm seeing in the test.
>>
>>
>> For the next test I wanted to see if i could double the performance by
>> striping an LV over 2 md's (so instead of using 10 disks/spindles, use
>> 20 disks/spindles)
>>
>> So i added md1 to the VG as PV.
>>
>> Created a fresh LV striped across the two PV's using a 64KB stripe
>> size and ran the test again.
>>
>>
>> Now the total IO's in 10 seconds are 16x larger than before. 190464 /
>> 10 = 19046,4 / 16 = 1190,4 /16 = the reported 75 IOPS above.
>> So the 64KB blocks seem to be split into 4KB blocks (64 / 16 = 4)
>> which results in a way larger total IO's.
>> The IO's per disk seem to be in 64KB blocks still only now with a
>> large MERGE figure besides it. (Now 4KB blocks are merged into 64KB
>> blocks?)
>>
>
> LVM will stripe the data between the two md's with a default stripe size of
> 4K - thus the first 4K will go to md0, the second to md1, etc. This is
> obviously terribly inefficient. For 8+2 raid6 with 64KB chunks, you want a
> stripe size of 8x64K = 512KB when you create the logical volume.
Ok, that makes sense.
But if I for instance had created a 10 disk RAID5 md with a 64KB chunk
size it would have been a stripe size of 9x64KB=576KB which is not
possible. So I have to make sure I always create a raid5/6 md where
the stripe size is a power of 2 when i want to use raid0 and/or LVM
striping, correct?
Caspar
>
>
>> The performance does not double but stays the same as with 1 MD set
>> only the total IO's are spread among the MD's. Each disk now does
>> around 60 IOPS!
>>
>> I still wanted to see if I could double the performance and thought it
>> might have something to do with LVM striping so i ditched LVM and
>> created a RAID0 (md6) over md0 and md1 with a chunk size of 64KB
>> again.
>>
>
> Similarly here, you want your chunk sizes to fit a stripe on the raid
> devices, not to fit the underlying devices. So try with a chunk size of
> 512KB (or higher).
>
> mvh.,
>
> David
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Striping does not increase performance.
2012-03-13 11:55 ` Caspar Smit
@ 2012-03-13 14:12 ` David Brown
0 siblings, 0 replies; 11+ messages in thread
From: David Brown @ 2012-03-13 14:12 UTC (permalink / raw)
To: linux-raid
On 13/03/2012 12:55, Caspar Smit wrote:
> Op 12 maart 2012 15:20 heeft David Brown<david@westcontrol.com> het
> volgende geschreven:
>> On 12/03/2012 13:34, Caspar Smit wrote:
>>>
>>> Hi all,
>>>
>>> I don't know exactly which mailinglists to use for this one so I hope
>>> i used the right ones.
>>>
>>> I did some performance testing on a new system and found out some
>>> things I couldn't explain or didn't expect.
>>> At the end are some questions I hope to get answered to explain the
>>> tings i'm seeing in the test.
>>>
>>>
>>> For the next test I wanted to see if i could double the performance by
>>> striping an LV over 2 md's (so instead of using 10 disks/spindles, use
>>> 20 disks/spindles)
>>>
>>> So i added md1 to the VG as PV.
>>>
>>> Created a fresh LV striped across the two PV's using a 64KB stripe
>>> size and ran the test again.
>>>
>>>
>>> Now the total IO's in 10 seconds are 16x larger than before. 190464 /
>>> 10 = 19046,4 / 16 = 1190,4 /16 = the reported 75 IOPS above.
>>> So the 64KB blocks seem to be split into 4KB blocks (64 / 16 = 4)
>>> which results in a way larger total IO's.
>>> The IO's per disk seem to be in 64KB blocks still only now with a
>>> large MERGE figure besides it. (Now 4KB blocks are merged into 64KB
>>> blocks?)
>>>
>>
>> LVM will stripe the data between the two md's with a default stripe size of
>> 4K - thus the first 4K will go to md0, the second to md1, etc. This is
>> obviously terribly inefficient. For 8+2 raid6 with 64KB chunks, you want a
>> stripe size of 8x64K = 512KB when you create the logical volume.
>
> Ok, that makes sense.
> But if I for instance had created a 10 disk RAID5 md with a 64KB chunk
> size it would have been a stripe size of 9x64KB=576KB which is not
> possible. So I have to make sure I always create a raid5/6 md where
> the stripe size is a power of 2 when i want to use raid0 and/or LVM
> striping, correct?
>
LVM raid is limited compared to mdadm. As far as I know, mdadm raid
chunk sizes are not limited to a power of 2.
Note, however, that I have no experience with more than 4 disks in an
array, only some theoretical knowledge. So my suggestions are only
ideas to try - nothing is guaranteed correct. Usually someone else on
this list will jump in if I say something truly stupid, so changing
chunk sizes is perhaps worth a try.
mvh.,
David
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-03-13 14:12 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-12 12:34 Striping does not increase performance Caspar Smit
2012-03-12 12:57 ` Erwan MAS
2012-03-12 13:02 ` Caspar Smit
2012-03-12 13:58 ` Erwan MAS
2012-03-12 14:34 ` Jiri Horky
2012-03-12 16:23 ` John Robinson
2012-03-12 14:33 ` Peter Grandi
2012-03-13 11:44 ` Caspar Smit
2012-03-12 14:20 ` David Brown
2012-03-13 11:55 ` Caspar Smit
2012-03-13 14:12 ` David Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).