From: Peter Landmann <sfrazt@googlemail.com>
To: linux-raid@vger.kernel.org
Subject: RAID 5 doesn't scale
Date: Wed, 3 Apr 2013 11:00:48 +0000 (UTC) [thread overview]
Message-ID: <loom.20130403T122905-373@post.gmane.org> (raw)
Hi,
i wrote it there http://article.gmane.org/gmane.linux.raid/42365 but want to go
in detail. Maybe there is another problem or
problem in my thinking.
Environment:
HW: AMD Phenom II 1055T 2,8 GHz, 8GB ram
Intel X25-M G2 Postville 80 GB SATA2 SSD
SW: kernel 3.4.0 but same performace with 3.8 from git and 3.9 from "next" tree
distribution: debian sid
Raid Settings:
for each hdd a 10 GB partition is used, 70 GB spare capacity
noop-scheduler
raid creation:
mdadm --create /dev/md9 --force --raid-devices=4 --chunk=64 --assume-clean -
-level=5 /dev/sdb1 /dev/sdc1 ..
FIO settings:
bs=4096
iodepth=248
direct=1
continue_on_error=1
rw=randwrite
ioengine=libaio
norandommap
refill_buffers
group_reporting
[test1]
numjobs=1
Theoretical performance: in single mode without raid each ssd writes 20k IOPS
and reads 40k IOPS.
With Raid 5 and with at least 4 SSDs there are as many write operations as read
operations. So a single SSD should deliver 13333
read and write operations per second.
Without Raid (a maximum performance of 140000 random read and 120000 random
write operations per second is archieved. so hw
shouldn't be the limiting factor for raid 5.
Evaluation: Random write in IOPS
#SSD experimental theoretical
3 14497.7 24000
4 14005 26666
5 17172.3 33333
6 19779 40000
Following stats and output for raid 5 with 6 SSDs
fio:
ssd10gbraid5rw: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=libaio,
iodepth=248
2.0.8
Starting 1 process
ssd10gbraid5rw: (groupid=0, jobs=1): err= 0: pid=32400
Description : [SSD 10GB raid5 (mdadm) random write test]
write: io=988.0KB, bw=79133KB/s, iops=19783 , runt=5300335msec
slat (usec): min=3 , max=282137 , avg= 7.46, stdev=36.26
clat (usec): min=250 , max=338796K, avg=12525.28, stdev=136706.65
lat (usec): min=259 , max=338796K, avg=12533.00, stdev=136706.66
clat percentiles (usec):
| 1.00th=[ 1048], 5.00th=[ 2096], 10.00th=[ 2672], 20.00th=[ 3504],
| 30.00th=[ 4576], 40.00th=[ 6496], 50.00th=[ 8512], 60.00th=[11456],
| 70.00th=[15168], 80.00th=[20352], 90.00th=[28544], 95.00th=[33536],
| 99.00th=[39168], 99.50th=[41216], 99.90th=[56064], 99.95th=[292864],
| 99.99th=[309248]
bw (KB/s) : min= 6907, max=100088, per=100.00%, avg=79313.22, stdev=8802.19
lat (usec) : 500=0.05%, 750=0.27%, 1000=0.52%
lat (msec) : 2=3.52%, 4=20.98%, 10=30.25%, 20=23.99%, 50=20.29%
lat (msec) : 100=0.03%, 250=0.01%, 500=0.10%, 750=0.01%, 1000=0.01%
lat (msec) : 2000=0.01%, >=2000=0.01%
cpu : usr=7.75%, sys=21.55%, ctx=47382311, majf=0, minf=0
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued : total=r=0/w=0/d=104857847, short=r=0/w=0/d=0
errors : total=0, first_error=0/<(null)>
Run status group 0 (all jobs):
WRITE: io=409601MB, aggrb=79132KB/s, minb=79132KB/s, maxb=79132KB/s,
mint=5300335msec, maxt=5300335msec
Disk stats (read/write):
md9: ios=84/104857172, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=34949993/34951372, aggrmerge=401/512,
aggrticks=130838494/122401043, aggrin_queue=253198596, aggrutil=96.05%
sdb: ios=34950097/34951445, merge=400/511, ticks=130214828/121603063,
in_queue=251778978, util=95.86%
sdc: ios=34952941/34954281, merge=399/516, ticks=130736987/122271756,
in_queue=252969493, util=95.91%
sdd: ios=34943892/34945256, merge=417/527, ticks=131734001/123258071,
in_queue=254949447, util=95.89%
sde: ios=34954980/34956283, merge=367/473, ticks=125822046/117619660,
in_queue=243399327, util=95.95%
sdf: ios=34952583/34954080, merge=415/532, ticks=137200055/128624635,
in_queue=265784289, util=96.05%
sdg: ios=34945469/34946890, merge=408/517, ticks=129323047/121029077,
in_queue=250310045, util=95.99%
top:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4525 root 20 0 0 0 0 R 39,6 0,0 98:16.78 md9_raid5
32400 root 20 0 79716 1824 420 S 30,6 0,1 0:02.77 fio
29099 root 20 0 0 0 0 R 7,3 0,0 0:33.90 kworker/u:0
31740 root 20 0 0 0 0 S 6,7 0,0 4:59.61 kworker/u:3
18488 root 20 0 0 0 0 S 5,7 0,0 2:06.64 kworker/u:1
31197 root 20 0 0 0 0 S 4,7 0,0 0:13.77 kworker/u:4
23450 root 20 0 0 0 0 S 3,0 0,0 1:34.33 kworker/u:7
27068 root 20 0 0 0 0 S 1,7 0,0 0:51.94 kworker/u:2
mpstat:
CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
all 1,17 0,00 12,67 12,71 3,27 3,05 0,00 0,00 67,13
0 1,41 0,00 7,88 15,42 0,07 0,15 0,00 0,00 75,07
1 0,00 0,00 38,04 3,14 19,20 18,08 0,00 0,00 21,54
2 1,50 0,00 7,55 14,78 0,07 0,02 0,00 0,00 76,08
3 1,09 0,00 7,31 12,15 0,05 0,02 0,00 0,00 79,38
4 1,35 0,00 7,41 12,94 0,07 0,00 0,00 0,00 78,23
5 1,65 0,00 7,78 17,84 0,12 0,03 0,00 0,00 72,57
iostat -x 1:
avg-cpu: %user %nice %system %iowait %steal %idle
0,67 0,00 18,79 3,69 0,00 76,85
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 6952,00 6935,00 27808,00 27740,00 8,00
24,97 1,80 2,00 1,59 0,06 77,90
sda 2,00 0,00 6774,00 6789,00 27104,00 27156,00 8,00
21,26 1,57 1,78 1,36 0,06 77,60
sdd 4,00 4,00 7059,00 7013,00 28252,00 28068,00 8,00
136,01 9,66 10,34 8,98 0,07 99,60
sdc 0,00 0,00 6851,00 6851,00 27404,00 27404,00 8,00
22,80 1,66 1,86 1,46 0,06 77,70
sdf 0,00 0,00 6931,00 6995,00 27724,00 27980,00 8,00
41,78 3,03 3,26 2,80 0,06 79,70
sde 0,00 0,00 6842,00 6837,00 27368,00 27348,00 8,00
31,59 2,31 2,53 2,08 0,06 79,60
another snapshot
avg-cpu: %user %nice %system %iowait %steal %idle
0,84 0,00 22,35 2,18 0,00 74,62
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sdb 1,00 2,00 8344,00 8400,00 33380,00 33608,00 8,00
67,39 4,06 4,30 3,82 0,06 97,80
sda 1,00 0,00 8305,00 8290,00 33224,00 33160,00 8,00
28,74 1,73 1,94 1,52 0,05 88,40
sdd 5,00 5,00 8393,00 8419,00 33592,00 33696,00 8,00
96,74 5,76 6,02 5,49 0,06 98,80
sdc 0,00 1,00 8199,00 8201,00 32796,00 32808,00 8,00
27,64 1,68 1,92 1,45 0,05 87,80
sdf 1,00 0,00 8332,00 8323,00 33328,00 33292,00 8,00
40,95 2,44 2,66 2,23 0,05 89,30
sde 0,00 0,00 8256,00 8263,00 33024,00 33052,00 8,00
28,94 1,75 1,96 1,54 0,05 89,50
mpstat for same test with 3.9 kernel from next-tree
CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
all 0,50 0,00 10,03 1,34 2,01 6,35 0,00 0,00 79,77
0 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00
1 0,00 0,00 25,00 0,00 5,00 18,00 0,00 0,00 52,00
2 0,00 0,00 20,83 0,00 5,21 18,75 0,00 0,00 55,21
3 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00
4 3,06 0,00 15,31 8,16 0,00 0,00 0,00 0,00 73,47
5 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00
So you have an idea why the real performance is only 50% of the theoretical
performance? No cpu core is at its limits.
As i said in my other post. I would be interested to solve the problem but i
have problems to identify it.
Peter Landmann
next reply other threads:[~2013-04-03 11:00 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-03 11:00 Peter Landmann [this message]
2013-04-03 11:21 ` RAID 5 doesn't scale Benjamin ESTRABAUD
2013-04-03 18:34 ` Martin Wilck
2013-04-03 20:38 ` Peter Landmann
2013-04-04 13:40 ` Benjamin ESTRABAUD
2013-04-03 13:18 ` Stan Hoeppner
2013-04-03 15:23 ` keld
2013-04-03 15:31 ` Peter Landmann
2013-04-03 18:35 ` Stan Hoeppner
2013-04-03 18:23 ` Martin Wilck
2013-04-03 20:36 ` Peter Landmann
2013-04-03 21:19 ` Peter Landmann
2013-04-03 21:24 ` Stan Hoeppner
2013-04-03 21:29 ` Peter Landmann
2013-04-03 21:15 ` Stan Hoeppner
2013-04-03 19:56 ` Roy Sigurd Karlsbakk
2013-04-03 21:12 ` Peter Landmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=loom.20130403T122905-373@post.gmane.org \
--to=sfrazt@googlemail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox