RAID 5 doesn't scale

All of lore.kernel.org
 help / color / mirror / Atom feed

* RAID 5 doesn't scale
@ 2013-04-03 11:00 Peter Landmann
  2013-04-03 11:21 ` Benjamin ESTRABAUD
  2013-04-03 13:18 ` Stan Hoeppner
  0 siblings, 2 replies; 17+ messages in thread
From: Peter Landmann @ 2013-04-03 11:00 UTC (permalink / raw)
  To: linux-raid

Hi,

i wrote it there http://article.gmane.org/gmane.linux.raid/42365 but want to go 
in detail. Maybe there is another problem or 
problem in my thinking.

Environment:
HW: AMD Phenom II 1055T 2,8 GHz, 8GB ram
    Intel X25-M G2 Postville 80 GB SATA2 SSD
SW: kernel 3.4.0 but same performace with 3.8 from git and 3.9 from "next" tree
    distribution: debian sid
Raid Settings: 
    for each hdd a 10 GB partition is used, 70 GB spare capacity
    noop-scheduler
    raid creation:
    mdadm --create /dev/md9 --force --raid-devices=4 --chunk=64 --assume-clean -
-level=5 /dev/sdb1 /dev/sdc1 ..

FIO settings:
bs=4096
iodepth=248
direct=1
continue_on_error=1
rw=randwrite
ioengine=libaio
norandommap
refill_buffers
group_reporting
[test1]
numjobs=1


Theoretical performance: in single mode without raid each ssd writes 20k IOPS 
and reads 40k IOPS.
With Raid 5 and with at least 4 SSDs there are as many write operations as read 
operations. So a single SSD should deliver 13333 
read and write operations per second.

Without Raid (a maximum performance of 140000 random read and 120000 random 
write operations per second is archieved. so hw 
shouldn't be the limiting factor for raid 5.


Evaluation: Random write in IOPS
#SSD experimental    theoretical
3  14497.7           24000
4  14005             26666
5  17172.3           33333
6  19779             40000

Following stats and output for  raid 5 with 6 SSDs

fio:
ssd10gbraid5rw: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=libaio, 
iodepth=248
2.0.8
Starting 1 process

ssd10gbraid5rw: (groupid=0, jobs=1): err= 0: pid=32400
  Description  : [SSD 10GB raid5 (mdadm) random write test]
  write: io=988.0KB, bw=79133KB/s, iops=19783 , runt=5300335msec
    slat (usec): min=3 , max=282137 , avg= 7.46, stdev=36.26
    clat (usec): min=250 , max=338796K, avg=12525.28, stdev=136706.65
     lat (usec): min=259 , max=338796K, avg=12533.00, stdev=136706.66
    clat percentiles (usec):
     |  1.00th=[ 1048],  5.00th=[ 2096], 10.00th=[ 2672], 20.00th=[ 3504],
     | 30.00th=[ 4576], 40.00th=[ 6496], 50.00th=[ 8512], 60.00th=[11456],
     | 70.00th=[15168], 80.00th=[20352], 90.00th=[28544], 95.00th=[33536],
     | 99.00th=[39168], 99.50th=[41216], 99.90th=[56064], 99.95th=[292864],
     | 99.99th=[309248]
    bw (KB/s)  : min= 6907, max=100088, per=100.00%, avg=79313.22, stdev=8802.19
    lat (usec) : 500=0.05%, 750=0.27%, 1000=0.52%
    lat (msec) : 2=3.52%, 4=20.98%, 10=30.25%, 20=23.99%, 50=20.29%
    lat (msec) : 100=0.03%, 250=0.01%, 500=0.10%, 750=0.01%, 1000=0.01%
    lat (msec) : 2000=0.01%, >=2000=0.01%
  cpu          : usr=7.75%, sys=21.55%, ctx=47382311, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued    : total=r=0/w=0/d=104857847, short=r=0/w=0/d=0
     errors    : total=0, first_error=0/<(null)>

Run status group 0 (all jobs):
  WRITE: io=409601MB, aggrb=79132KB/s, minb=79132KB/s, maxb=79132KB/s, 
mint=5300335msec, maxt=5300335msec

Disk stats (read/write):
    md9: ios=84/104857172, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, 
aggrios=34949993/34951372, aggrmerge=401/512, 
aggrticks=130838494/122401043, aggrin_queue=253198596, aggrutil=96.05%
  sdb: ios=34950097/34951445, merge=400/511, ticks=130214828/121603063, 
in_queue=251778978, util=95.86%
  sdc: ios=34952941/34954281, merge=399/516, ticks=130736987/122271756, 
in_queue=252969493, util=95.91%
  sdd: ios=34943892/34945256, merge=417/527, ticks=131734001/123258071, 
in_queue=254949447, util=95.89%
  sde: ios=34954980/34956283, merge=367/473, ticks=125822046/117619660, 
in_queue=243399327, util=95.95%
  sdf: ios=34952583/34954080, merge=415/532, ticks=137200055/128624635, 
in_queue=265784289, util=96.05%
  sdg: ios=34945469/34946890, merge=408/517, ticks=129323047/121029077, 
in_queue=250310045, util=95.99%

top:
  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 4525 root      20   0     0    0    0 R  39,6  0,0  98:16.78 md9_raid5
32400 root      20   0 79716 1824  420 S  30,6  0,1   0:02.77 fio
29099 root      20   0     0    0    0 R   7,3  0,0   0:33.90 kworker/u:0
31740 root      20   0     0    0    0 S   6,7  0,0   4:59.61 kworker/u:3
18488 root      20   0     0    0    0 S   5,7  0,0   2:06.64 kworker/u:1
31197 root      20   0     0    0    0 S   4,7  0,0   0:13.77 kworker/u:4
23450 root      20   0     0    0    0 S   3,0  0,0   1:34.33 kworker/u:7
27068 root      20   0     0    0    0 S   1,7  0,0   0:51.94 kworker/u:2

mpstat:
CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
all    1,17    0,00   12,67   12,71    3,27    3,05    0,00    0,00   67,13
0    1,41    0,00    7,88   15,42    0,07    0,15    0,00    0,00   75,07
1    0,00    0,00   38,04    3,14   19,20   18,08    0,00    0,00   21,54
2    1,50    0,00    7,55   14,78    0,07    0,02    0,00    0,00   76,08
3    1,09    0,00    7,31   12,15    0,05    0,02    0,00    0,00   79,38
4    1,35    0,00    7,41   12,94    0,07    0,00    0,00    0,00   78,23
5    1,65    0,00    7,78   17,84    0,12    0,03    0,00    0,00   72,57

iostat -x 1:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,67    0,00   18,79    3,69    0,00   76,85

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdb               0,00     0,00 6952,00 6935,00 27808,00 27740,00     8,00    
24,97    1,80    2,00    1,59   0,06  77,90
sda               2,00     0,00 6774,00 6789,00 27104,00 27156,00     8,00    
21,26    1,57    1,78    1,36   0,06  77,60
sdd               4,00     4,00 7059,00 7013,00 28252,00 28068,00     8,00   
136,01    9,66   10,34    8,98   0,07  99,60
sdc               0,00     0,00 6851,00 6851,00 27404,00 27404,00     8,00    
22,80    1,66    1,86    1,46   0,06  77,70
sdf               0,00     0,00 6931,00 6995,00 27724,00 27980,00     8,00    
41,78    3,03    3,26    2,80   0,06  79,70
sde               0,00     0,00 6842,00 6837,00 27368,00 27348,00     8,00    
31,59    2,31    2,53    2,08   0,06  79,60

another snapshot
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,84    0,00   22,35    2,18    0,00   74,62

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdb               1,00     2,00 8344,00 8400,00 33380,00 33608,00     8,00    
67,39    4,06    4,30    3,82   0,06  97,80
sda               1,00     0,00 8305,00 8290,00 33224,00 33160,00     8,00    
28,74    1,73    1,94    1,52   0,05  88,40
sdd               5,00     5,00 8393,00 8419,00 33592,00 33696,00     8,00    
96,74    5,76    6,02    5,49   0,06  98,80
sdc               0,00     1,00 8199,00 8201,00 32796,00 32808,00     8,00    
27,64    1,68    1,92    1,45   0,05  87,80
sdf               1,00     0,00 8332,00 8323,00 33328,00 33292,00     8,00    
40,95    2,44    2,66    2,23   0,05  89,30
sde               0,00     0,00 8256,00 8263,00 33024,00 33052,00     8,00    
28,94    1,75    1,96    1,54   0,05  89,50

mpstat for same test with 3.9 kernel from next-tree
CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
all    0,50    0,00   10,03    1,34    2,01    6,35    0,00    0,00   79,77
0    0,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00  100,00
1    0,00    0,00   25,00    0,00    5,00   18,00    0,00    0,00   52,00
2    0,00    0,00   20,83    0,00    5,21   18,75    0,00    0,00   55,21
3    0,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00  100,00
4    3,06    0,00   15,31    8,16    0,00    0,00    0,00    0,00   73,47
5    0,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00  100,00


So you have an idea why the real performance is only 50% of the theoretical 
performance? No cpu core is at its limits.
As i said in my other post. I would be interested to solve the problem but i 
have problems to identify it.

Peter Landmann



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 11:00 RAID 5 doesn't scale Peter Landmann
@ 2013-04-03 11:21 ` Benjamin ESTRABAUD
  2013-04-03 18:34   ` Martin Wilck
  2013-04-03 13:18 ` Stan Hoeppner
  1 sibling, 1 reply; 17+ messages in thread
From: Benjamin ESTRABAUD @ 2013-04-03 11:21 UTC (permalink / raw)
  To: linux-raid

On 03/04/13 12:00, Peter Landmann wrote:
> Hi,
Hi,
>
> i wrote it there http://article.gmane.org/gmane.linux.raid/42365 but want to go
> in detail. Maybe there is another problem or
> problem in my thinking.
>
> Environment:
> HW: AMD Phenom II 1055T 2,8 GHz, 8GB ram
>      Intel X25-M G2 Postville 80 GB SATA2 SSD
> SW: kernel 3.4.0 but same performace with 3.8 from git and 3.9 from "next" tree
>      distribution: debian sid
> Raid Settings:
>      for each hdd a 10 GB partition is used, 70 GB spare capacity
>      noop-scheduler
>      raid creation:
>      mdadm --create /dev/md9 --force --raid-devices=4 --chunk=64 --assume-clean -
> -level=5 /dev/sdb1 /dev/sdc1 ..
So here your RAID5 has a chunk size of 64K, and you have 4 drives in a 
RAID 5, so your stripe size is 192KB if I'm correct.
> FIO settings:
> bs=4096
> iodepth=248
> direct=1
> continue_on_error=1
> rw=randwrite
> ioengine=libaio
> norandommap
> refill_buffers
> group_reporting
> [test1]
> numjobs=1
>
It seems that you are running random 4K writes on this array (unless you 
are running the test on the SSD directly here?). If so, you are writing 
lots of 4K sectors on independant 192KB stripes. This means that the 
whole 192KB of stripe needs to be first read, copied to memory, modified 
with the new 4K of data, have its parity calculated and the new stripe 
rewritten to the underlying disks. Add to that that depending on your 
SSD, there might be some read-modify-write cycles happening in the 
background (since you might be running more small random IOs that the 
underlying flash can handle transparently). The performance hit is 
therefore possible.

The guess here is that to maximize performance, you would want to first 
run IOs which minimize the read/modify/write on the RAID itself (so 
writing full 192KB IOs, making sure they are also aligned correctly with 
the underlying RAID), and also maybe tune your RAID chunk size to 
minimize possible RMW cycles on the SSD. However, the SSD aspect is 
unlikely the cause of your performance issue if you get good performance 
writing 4K blocks on the SSD itself.

So it would seem to me that what's killing your performance is the RMW 
on the RAID itself, everytime you want to write 4K a whole stripe has to 
be read, modified in memory, and 192K of data has to be rewritten to the 
array, making it highly inefficient.

A smaller chunk size might help with handling this kind of IOs. The 
thing here is that you have to ask yourself if 4K random writes are 
really what you are going to run, or if this was just for the sake of 
testing?

You could also test read performance (no RMW hit) to see if there is no 
bottleneck there (thus partially confirming the above).

Also, don't take my word for it just yet, maybe wait for confirmation 
from some other people on this ML, the above is what I *think* is 
happening but I could definitely be completely wrong.
> Theoretical performance: in single mode without raid each ssd writes 20k IOPS
> and reads 40k IOPS.
> With Raid 5 and with at least 4 SSDs there are as many write operations as read
> operations. So a single SSD should deliver 13333
> read and write operations per second.
>
> Without Raid (a maximum performance of 140000 random read and 120000 random
> write operations per second is archieved. so hw
> shouldn't be the limiting factor for raid 5.
>
>
> Evaluation: Random write in IOPS
> #SSD experimental    theoretical
> 3  14497.7           24000
> 4  14005             26666
> 5  17172.3           33333
> 6  19779             40000
>
> Following stats and output for  raid 5 with 6 SSDs
>
> fio:
> ssd10gbraid5rw: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=libaio,
> iodepth=248
> 2.0.8
> Starting 1 process
>
> ssd10gbraid5rw: (groupid=0, jobs=1): err= 0: pid=32400
>    Description  : [SSD 10GB raid5 (mdadm) random write test]
>    write: io=988.0KB, bw=79133KB/s, iops=19783 , runt=5300335msec
>      slat (usec): min=3 , max=282137 , avg= 7.46, stdev=36.26
>      clat (usec): min=250 , max=338796K, avg=12525.28, stdev=136706.65
>       lat (usec): min=259 , max=338796K, avg=12533.00, stdev=136706.66
>      clat percentiles (usec):
>       |  1.00th=[ 1048],  5.00th=[ 2096], 10.00th=[ 2672], 20.00th=[ 3504],
>       | 30.00th=[ 4576], 40.00th=[ 6496], 50.00th=[ 8512], 60.00th=[11456],
>       | 70.00th=[15168], 80.00th=[20352], 90.00th=[28544], 95.00th=[33536],
>       | 99.00th=[39168], 99.50th=[41216], 99.90th=[56064], 99.95th=[292864],
>       | 99.99th=[309248]
>      bw (KB/s)  : min= 6907, max=100088, per=100.00%, avg=79313.22, stdev=8802.19
>      lat (usec) : 500=0.05%, 750=0.27%, 1000=0.52%
>      lat (msec) : 2=3.52%, 4=20.98%, 10=30.25%, 20=23.99%, 50=20.29%
>      lat (msec) : 100=0.03%, 250=0.01%, 500=0.10%, 750=0.01%, 1000=0.01%
>      lat (msec) : 2000=0.01%, >=2000=0.01%
>    cpu          : usr=7.75%, sys=21.55%, ctx=47382311, majf=0, minf=0
>    IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
>       issued    : total=r=0/w=0/d=104857847, short=r=0/w=0/d=0
>       errors    : total=0, first_error=0/<(null)>
>
> Run status group 0 (all jobs):
>    WRITE: io=409601MB, aggrb=79132KB/s, minb=79132KB/s, maxb=79132KB/s,
> mint=5300335msec, maxt=5300335msec
>
> Disk stats (read/write):
>      md9: ios=84/104857172, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
> aggrios=34949993/34951372, aggrmerge=401/512,
> aggrticks=130838494/122401043, aggrin_queue=253198596, aggrutil=96.05%
>    sdb: ios=34950097/34951445, merge=400/511, ticks=130214828/121603063,
> in_queue=251778978, util=95.86%
>    sdc: ios=34952941/34954281, merge=399/516, ticks=130736987/122271756,
> in_queue=252969493, util=95.91%
>    sdd: ios=34943892/34945256, merge=417/527, ticks=131734001/123258071,
> in_queue=254949447, util=95.89%
>    sde: ios=34954980/34956283, merge=367/473, ticks=125822046/117619660,
> in_queue=243399327, util=95.95%
>    sdf: ios=34952583/34954080, merge=415/532, ticks=137200055/128624635,
> in_queue=265784289, util=96.05%
>    sdg: ios=34945469/34946890, merge=408/517, ticks=129323047/121029077,
> in_queue=250310045, util=95.99%
>
> top:
>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
>   4525 root      20   0     0    0    0 R  39,6  0,0  98:16.78 md9_raid5
> 32400 root      20   0 79716 1824  420 S  30,6  0,1   0:02.77 fio
> 29099 root      20   0     0    0    0 R   7,3  0,0   0:33.90 kworker/u:0
> 31740 root      20   0     0    0    0 S   6,7  0,0   4:59.61 kworker/u:3
> 18488 root      20   0     0    0    0 S   5,7  0,0   2:06.64 kworker/u:1
> 31197 root      20   0     0    0    0 S   4,7  0,0   0:13.77 kworker/u:4
> 23450 root      20   0     0    0    0 S   3,0  0,0   1:34.33 kworker/u:7
> 27068 root      20   0     0    0    0 S   1,7  0,0   0:51.94 kworker/u:2
>
> mpstat:
> CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
> all    1,17    0,00   12,67   12,71    3,27    3,05    0,00    0,00   67,13
> 0    1,41    0,00    7,88   15,42    0,07    0,15    0,00    0,00   75,07
> 1    0,00    0,00   38,04    3,14   19,20   18,08    0,00    0,00   21,54
> 2    1,50    0,00    7,55   14,78    0,07    0,02    0,00    0,00   76,08
> 3    1,09    0,00    7,31   12,15    0,05    0,02    0,00    0,00   79,38
> 4    1,35    0,00    7,41   12,94    0,07    0,00    0,00    0,00   78,23
> 5    1,65    0,00    7,78   17,84    0,12    0,03    0,00    0,00   72,57
>
> iostat -x 1:
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>             0,67    0,00   18,79    3,69    0,00   76,85
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sdb               0,00     0,00 6952,00 6935,00 27808,00 27740,00     8,00
> 24,97    1,80    2,00    1,59   0,06  77,90
> sda               2,00     0,00 6774,00 6789,00 27104,00 27156,00     8,00
> 21,26    1,57    1,78    1,36   0,06  77,60
> sdd               4,00     4,00 7059,00 7013,00 28252,00 28068,00     8,00
> 136,01    9,66   10,34    8,98   0,07  99,60
> sdc               0,00     0,00 6851,00 6851,00 27404,00 27404,00     8,00
> 22,80    1,66    1,86    1,46   0,06  77,70
> sdf               0,00     0,00 6931,00 6995,00 27724,00 27980,00     8,00
> 41,78    3,03    3,26    2,80   0,06  79,70
> sde               0,00     0,00 6842,00 6837,00 27368,00 27348,00     8,00
> 31,59    2,31    2,53    2,08   0,06  79,60
>
> another snapshot
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>             0,84    0,00   22,35    2,18    0,00   74,62
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sdb               1,00     2,00 8344,00 8400,00 33380,00 33608,00     8,00
> 67,39    4,06    4,30    3,82   0,06  97,80
> sda               1,00     0,00 8305,00 8290,00 33224,00 33160,00     8,00
> 28,74    1,73    1,94    1,52   0,05  88,40
> sdd               5,00     5,00 8393,00 8419,00 33592,00 33696,00     8,00
> 96,74    5,76    6,02    5,49   0,06  98,80
> sdc               0,00     1,00 8199,00 8201,00 32796,00 32808,00     8,00
> 27,64    1,68    1,92    1,45   0,05  87,80
> sdf               1,00     0,00 8332,00 8323,00 33328,00 33292,00     8,00
> 40,95    2,44    2,66    2,23   0,05  89,30
> sde               0,00     0,00 8256,00 8263,00 33024,00 33052,00     8,00
> 28,94    1,75    1,96    1,54   0,05  89,50
>
> mpstat for same test with 3.9 kernel from next-tree
> CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
> all    0,50    0,00   10,03    1,34    2,01    6,35    0,00    0,00   79,77
> 0    0,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00  100,00
> 1    0,00    0,00   25,00    0,00    5,00   18,00    0,00    0,00   52,00
> 2    0,00    0,00   20,83    0,00    5,21   18,75    0,00    0,00   55,21
> 3    0,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00  100,00
> 4    3,06    0,00   15,31    8,16    0,00    0,00    0,00    0,00   73,47
> 5    0,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00  100,00
>
>
> So you have an idea why the real performance is only 50% of the theoretical
> performance? No cpu core is at its limits.
> As i said in my other post. I would be interested to solve the problem but i
> have problems to identify it.
> Peter Landmann

Regards,

Ben.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 11:21 ` Benjamin ESTRABAUD
@ 2013-04-03 18:34   ` Martin Wilck
  2013-04-03 20:38     ` Peter Landmann
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Wilck @ 2013-04-03 18:34 UTC (permalink / raw)
  To: Benjamin ESTRABAUD; +Cc: linux-raid

On 04/03/2013 01:21 PM, Benjamin ESTRABAUD wrote:

> It seems that you are running random 4K writes on this array (unless you 
> are running the test on the SSD directly here?). If so, you are writing 
> lots of 4K sectors on independant 192KB stripes. This means that the 
> whole 192KB of stripe needs to be first read, copied to memory, modified 
> with the new 4K of data, have its parity calculated and the new stripe 
> rewritten to the underlying disks.

That's not strictly necessary. For each 4k block to be written, it's
sufficient to read the data block and the corresponding parity block
(2x4k), calculate the changes in the parity block from the difference
between the old and new data block, and write both data and parity
(2x4k). Thus for every write IOP, 4 RAID IOPS are needed (2x read, 2x
write).

Doesn't MD do it this way?

Martin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 18:34   ` Martin Wilck
@ 2013-04-03 20:38     ` Peter Landmann
  2013-04-04 13:40       ` Benjamin ESTRABAUD
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Landmann @ 2013-04-03 20:38 UTC (permalink / raw)
  To: linux-raid

Martin Wilck <mwilck <at> arcor.de> writes:


> 
> Doesn't MD do it this way?
> 
> Martin

You are right. You can see it with mpstat.

Peter


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 20:38     ` Peter Landmann
@ 2013-04-04 13:40       ` Benjamin ESTRABAUD
  0 siblings, 0 replies; 17+ messages in thread
From: Benjamin ESTRABAUD @ 2013-04-04 13:40 UTC (permalink / raw)
  To: linux-raid

On 03/04/13 21:38, Peter Landmann wrote:
> Martin Wilck <mwilck <at> arcor.de> writes:
>
>
>> Doesn't MD do it this way?
>>
>> Martin
> You are right. You can see it with mpstat.
Thanks both for that, I wasn't aware of this.
> Peter
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 11:00 RAID 5 doesn't scale Peter Landmann
  2013-04-03 11:21 ` Benjamin ESTRABAUD
@ 2013-04-03 13:18 ` Stan Hoeppner
  2013-04-03 15:23   ` keld
                     ` (4 more replies)
  1 sibling, 5 replies; 17+ messages in thread
From: Stan Hoeppner @ 2013-04-03 13:18 UTC (permalink / raw)
  To: Peter Landmann; +Cc: linux-raid

On 4/3/2013 6:00 AM, Peter Landmann wrote:

You didn't mention your stripe_cache_size value.  It'll make a lot of
difference.  Make sure it's at least 4096.  The default is 256.

~$ /bin/echo 4096 > /sys/block/md[X]/md/stripe_cache_size

> FIO settings:
> bs=4096
> iodepth=248
> direct=1
> continue_on_error=1
> rw=randwrite
> ioengine=libaio
> norandommap
> refill_buffers
> group_reporting

> numjobs=1

^^^^^^^^^^^  Even when using AIO you're still serialized when using a
single thread, regardless of queue depth.  Thus there is non trivial
latency between IO operations.  Retest with only these global parameters
to get some concurrency.  Along with a larger stripe cache your numbers
should go up substantially.  This test runs 4 threads/core to ensure you
saturate md with IO.

[global]
zero_buffers
numjobs=24
thread
group_reporting
blocksize=4096
ioengine=libaio
iodepth=16
direct=1
size=8G

> So you have an idea why the real performance is only 50% of the theoretical 
> performance? 

Three reasons:  IO latency, limited stripe_cache_size, parity RMW

> No cpu core is at its limits.

Because you're not cycle limited but latency limited.  With this FIO
test your CPU burn should increase a bit.

> As i said in my other post. I would be interested to solve the problem but i 
> have problems to identify it.

Note also that you're doing 4KB random writes against RAID5.  This is
going to generate substantial RMW cycles.  The Intel X25-M G2 is not a
speed daemon.  Its published max 4KB IOPS throughput is for purely
random writes, not the read+write pattern created by parity RMW.  So
while your random read should get a nice jump with this test, your
random write may not improve as much.  The limitation here is a function
of the SSD controller on the X25-M G2, not md/RAID5.  If you test 5
drives in md/RAID0 you'll see a bump in random write IOPS.

-- 
Stan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 13:18 ` Stan Hoeppner
@ 2013-04-03 15:23   ` keld
  2013-04-03 15:31   ` Peter Landmann
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: keld @ 2013-04-03 15:23 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Peter Landmann, linux-raid

Hi Peter

In general Linux RAID 5 scales well,
See https://raid.wiki.kernel.org/index.php/Performance

Best regards
keld

On Wed, Apr 03, 2013 at 08:18:52AM -0500, Stan Hoeppner wrote:
> On 4/3/2013 6:00 AM, Peter Landmann wrote:
> 
> You didn't mention your stripe_cache_size value.  It'll make a lot of
> difference.  Make sure it's at least 4096.  The default is 256.
> 
> ~$ /bin/echo 4096 > /sys/block/md[X]/md/stripe_cache_size
> 
> > FIO settings:
> > bs=4096
> > iodepth=248
> > direct=1
> > continue_on_error=1
> > rw=randwrite
> > ioengine=libaio
> > norandommap
> > refill_buffers
> > group_reporting
> 
> > numjobs=1
> 
> ^^^^^^^^^^^  Even when using AIO you're still serialized when using a
> single thread, regardless of queue depth.  Thus there is non trivial
> latency between IO operations.  Retest with only these global parameters
> to get some concurrency.  Along with a larger stripe cache your numbers
> should go up substantially.  This test runs 4 threads/core to ensure you
> saturate md with IO.
> 
> [global]
> zero_buffers
> numjobs=24
> thread
> group_reporting
> blocksize=4096
> ioengine=libaio
> iodepth=16
> direct=1
> size=8G
> 
> > So you have an idea why the real performance is only 50% of the theoretical 
> > performance? 
> 
> Three reasons:  IO latency, limited stripe_cache_size, parity RMW
> 
> > No cpu core is at its limits.
> 
> Because you're not cycle limited but latency limited.  With this FIO
> test your CPU burn should increase a bit.
> 
> > As i said in my other post. I would be interested to solve the problem but i 
> > have problems to identify it.
> 
> Note also that you're doing 4KB random writes against RAID5.  This is
> going to generate substantial RMW cycles.  The Intel X25-M G2 is not a
> speed daemon.  Its published max 4KB IOPS throughput is for purely
> random writes, not the read+write pattern created by parity RMW.  So
> while your random read should get a nice jump with this test, your
> random write may not improve as much.  The limitation here is a function
> of the SSD controller on the X25-M G2, not md/RAID5.  If you test 5
> drives in md/RAID0 you'll see a bump in random write IOPS.
> 
> -- 
> Stan
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 13:18 ` Stan Hoeppner
  2013-04-03 15:23   ` keld
@ 2013-04-03 15:31   ` Peter Landmann
  2013-04-03 18:35     ` Stan Hoeppner
  2013-04-03 18:23   ` Martin Wilck
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Peter Landmann @ 2013-04-03 15:31 UTC (permalink / raw)
  To: linux-raid

Stan Hoeppner <stan <at> hardwarefreak.com> writes:

> 
> On 4/3/2013 6:00 AM, Peter Landmann wrote:
> 
> You didn't mention your stripe_cache_size value.  It'll make a lot of
> difference.  Make sure it's at least 4096.  The default is 256.

You are very right.
I increased it to 4096 - 32768 and the performance increased much.
Also i played a bit with deadline parameters and it helped also to increase 
performance.

With Raid 5 and 6 SSDs i got 33936 IOPS (fio settings as before) which is not 
far away from theoretical 40000 (i know from former tests that the performance 
could be increased for some more jobs).

For your info: With Raid 6 and 6 SSDs i got 32526 IOPS which is also a very good 
result.

So i conclude that there is no (big) problem with scalability at this hw level, 
right?

> 
> ^^^^^^^^^^^  Even when using AIO you're still serialized when using a
> single thread, regardless of queue depth.  Thus there is non trivial
> latency between IO operations.  Retest with only these global parameters
> to get some concurrency.  Along with a larger stripe cache your numbers
> should go up substantially.  This test runs 4 threads/core to ensure you
> saturate md with IO.
> 
> [global]
> zero_buffers
> numjobs=24
> thread
> group_reporting
> blocksize=4096
> ioengine=libaio
> iodepth=16
> direct=1
> size=8G
Yeah, that brings me near 40k IOPS (Raid 5, 6 SSDs)
> 
> > So you have an idea why the real performance is only 50% of the theoretical 
> > performance? 
> 
> Three reasons:  IO latency, limited stripe_cache_size, parity RMW
> 
> > No cpu core is at its limits.
> 
> Because you're not cycle limited but latency limited.  With this FIO
> test your CPU burn should increase a bit.
> 
> > As i said in my other post. I would be interested to solve the problem but i 
> > have problems to identify it.
> 
> Note also that you're doing 4KB random writes against RAID5.  This is
> going to generate substantial RMW cycles.  The Intel X25-M G2 is not a
> speed daemon.  Its published max 4KB IOPS throughput is for purely
> random writes, not the read+write pattern created by parity RMW.  So
> while your random read should get a nice jump with this test, your
> random write may not improve as much.  The limitation here is a function
> of the SSD controller on the X25-M G2, not md/RAID5.  If you test 5
> drives in md/RAID0 you'll see a bump in random write IOPS.

FYI: The scheduler makes the difference. If you alternate writes and reades in 
small steps (R W R R W R W W R ..) then the performce decreases heavily. If you 
group read and write operations (20xW  20xR 20xW ..)then the performance will be 
better. Tested it without raid and a patched fio (and noop scheduler). But 
deadline scheduler can reach the same i learned.


Thx for your informations and hints
Peter




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 15:31   ` Peter Landmann
@ 2013-04-03 18:35     ` Stan Hoeppner
  0 siblings, 0 replies; 17+ messages in thread
From: Stan Hoeppner @ 2013-04-03 18:35 UTC (permalink / raw)
  To: Peter Landmann; +Cc: linux-raid

On 4/3/2013 10:31 AM, Peter Landmann wrote:
> Stan Hoeppner <stan <at> hardwarefreak.com> writes:
> 
>>
>> On 4/3/2013 6:00 AM, Peter Landmann wrote:
>>
>> You didn't mention your stripe_cache_size value.  It'll make a lot of
>> difference.  Make sure it's at least 4096.  The default is 256.
> 
> You are very right.
> I increased it to 4096 - 32768 and the performance increased much.

Be careful here.  Increasing stripe_cache_size increases memory
consumption of md dramatically.  Formula:  stripe_cache_size * 4096
bytes * drive_count = RAM usage.  For a 6 drive array that's

stripe_cache_size	RAM consumed
 4096			 96MB
 8192			192MB
16384			384MB
32768			768MB

Thus you want to select a value that gives you the best combination of
performance and lowest memory usage, unless you're not concerned about RAM.

> Also i played a bit with deadline parameters and it helped also to increase 
> performance.
...
> With Raid 5 and 6 SSDs i got 33936 IOPS (fio settings as before) which is not 
> far away from theoretical 40000 (i know from former tests that the performance 
> could be increased for some more jobs).

Always test with parallel threads.  If you don't you're not getting a
realistic picture of what md/RAID and the hardware are capable of.

> For your info: With Raid 6 and 6 SSDs i got 32526 IOPS which is also a very good 
> result.
> 
> So i conclude that there is no (big) problem with scalability at this hw level, 
> right?

Yes.  What this demonstrates is that one Thuban core at 2.8-3.3GHz can
apparently execute the md/RAID5/6 write threads faster than these 6
X25-M G2 SSDs can sink the writes.  If your CPU was a 1.6GHz Atom and/or
these were newer SATAIII Sandforce based SSDs, you'd peak a CPU core
long before the SSDs run out of headroom.

> FYI: The scheduler makes the difference. If you alternate writes and reades in 
> small steps (R W R R W R W W R ..) then the performce decreases heavily. If you 
> group read and write operations (20xW  20xR 20xW ..)then the performance will be 
> better. Tested it without raid and a patched fio (and noop scheduler). But 
> deadline scheduler can reach the same i learned.

The scheduler can play a difference, but with SSDs noop usually gives
the best results.  With some SATA/drive controller combos deadline may
be better.  CFQ is rarely, if ever, good for performance.

> Thx for your informations and hints

You bet.

-- 
Stan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 13:18 ` Stan Hoeppner
  2013-04-03 15:23   ` keld
  2013-04-03 15:31   ` Peter Landmann
@ 2013-04-03 18:23   ` Martin Wilck
  2013-04-03 20:36     ` Peter Landmann
  2013-04-03 21:15     ` Stan Hoeppner
  2013-04-03 19:56   ` Roy Sigurd Karlsbakk
  2013-04-03 21:12   ` Peter Landmann
  4 siblings, 2 replies; 17+ messages in thread
From: Martin Wilck @ 2013-04-03 18:23 UTC (permalink / raw)
  To: stan; +Cc: Peter Landmann, linux-raid

On 04/03/2013 03:18 PM, Stan Hoeppner wrote:

> You didn't mention your stripe_cache_size value.  It'll make a lot of
> difference.  Make sure it's at least 4096.  The default is 256.

I'm not getting it - why would stripe cache size matter in a random
read/write test? If the disks are large enough and the pattern is really
random, the cache should hardly ever be hit (s_c_z = 4096 =^ 16MB cache
per disk, that's 0.01% of disk size for a 160GB SSD).

I read that Peter confirmed the influence of stripe_cache_size, but I'd
like to understand why it matters in this case.

Martin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 18:23   ` Martin Wilck
@ 2013-04-03 20:36     ` Peter Landmann
  2013-04-03 21:19       ` Peter Landmann
  2013-04-03 21:24       ` Stan Hoeppner
  2013-04-03 21:15     ` Stan Hoeppner
  1 sibling, 2 replies; 17+ messages in thread
From: Peter Landmann @ 2013-04-03 20:36 UTC (permalink / raw)
  To: linux-raid


> 
> On 04/03/2013 03:18 PM, Stan Hoeppner wrote:
> 
> > You didn't mention your stripe_cache_size value.  It'll make a lot of
> > difference.  Make sure it's at least 4096.  The default is 256.
> 
> I'm not getting it - why would stripe cache size matter in a random
> read/write test? If the disks are large enough and the pattern is really
> random, the cache should hardly ever be hit (s_c_z = 4096 =^ 16MB cache
> per disk, that's 0.01% of disk size for a 160GB SSD).
> 
> I read that Peter confirmed the influence of stripe_cache_size, but I'd
> like to understand why it matters in this case.
> 
> Martin


I'm very sorry but now i can't confirm anymore that stripe_cache_size helps.

My test was to short. With every minute the IOPS decrease. So 
stripe_cache_size does only help for very short tests. 

I will provide details in another post.

Peter



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 20:36     ` Peter Landmann
@ 2013-04-03 21:19       ` Peter Landmann
  2013-04-03 21:24       ` Stan Hoeppner
  1 sibling, 0 replies; 17+ messages in thread
From: Peter Landmann @ 2013-04-03 21:19 UTC (permalink / raw)
  To: linux-raid

 
> I'm very sorry but now i can't confirm anymore that stripe_cache_size 
helps.
> 
> My test was to short. With every minute the IOPS decrease. So 
> stripe_cache_size does only help for very short tests. 
> 
> I will provide details in another post.
> 


Now i wish i could delete the post. As the SSD performance decrease with 
running time (within minutes) i'm not sure if the better performance with 
higher stripe_cache_size would be constant over time or is more a effect 
from a fresh empty cache that ceases with time (in my scenario with writing 
many small random blocks)

Short tests results:

Raid 5, 3 SSD
stripe_cache_size 	noop	deadline (tuned)
256	                18914	18730
16384	                18161	19766

Raid 5, 4 SSD
stripe_cache_size 	noop	deadline (tuned)
256	                11863	13716
16384	                13186	14688



Peter


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 20:36     ` Peter Landmann
  2013-04-03 21:19       ` Peter Landmann
@ 2013-04-03 21:24       ` Stan Hoeppner
  2013-04-03 21:29         ` Peter Landmann
  1 sibling, 1 reply; 17+ messages in thread
From: Stan Hoeppner @ 2013-04-03 21:24 UTC (permalink / raw)
  To: Peter Landmann; +Cc: linux-raid

On 4/3/2013 3:36 PM, Peter Landmann wrote:

> I'm very sorry but now i can't confirm anymore that stripe_cache_size helps.
> 
> My test was to short. With every minute the IOPS decrease. So 
> stripe_cache_size does only help for very short tests. 

If you're running the tests for multiple minutes and many tens of GBs at
a time, then this slowdown is due to garbage collection, not stripe
cache sizing.

You are not performing proper testing methodologies, and you're jumping
to conclusions way too quickly, and likely incorrectly.

If you're not familiar with SSD garbage collection then you must learn
about it.  It will affect everything you do with SSDs, especially when
doing these kinds of tests where you're writing huge amounts of data to
the flash cells.  Wear leveling, part of garbage collection,
dramatically slows down SSD throughput.  And when you're pushing this
much data, TRIM won't help.  It'll actually slow the SDDs down even more.

-- 
Stan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 21:24       ` Stan Hoeppner
@ 2013-04-03 21:29         ` Peter Landmann
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Landmann @ 2013-04-03 21:29 UTC (permalink / raw)
  To: linux-raid

> If you're running the tests for multiple minutes and many tens of GBs at
> a time, then this slowdown is due to garbage collection, not stripe
> cache sizing.

You are right. I wrote about that in another post.
> 
> You are not performing proper testing methodologies, and you're jumping
> to conclusions way too quickly, and likely incorrectly.

See above.
> 
> If you're not familiar with SSD garbage collection then you must learn
> about it.  It will affect everything you do with SSDs, especially when
> doing these kinds of tests where you're writing huge amounts of data to
> the flash cells.  Wear leveling, part of garbage collection,
> dramatically slows down SSD throughput.  And when you're pushing this
> much data, TRIM won't help.  It'll actually slow the SDDs down even more.

In a shortage of time i was to fast .. I'm sorry for the trouble but at least 
some people and i could learn something about md :)

Peter




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 18:23   ` Martin Wilck
  2013-04-03 20:36     ` Peter Landmann
@ 2013-04-03 21:15     ` Stan Hoeppner
  1 sibling, 0 replies; 17+ messages in thread
From: Stan Hoeppner @ 2013-04-03 21:15 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Peter Landmann, linux-raid

On 4/3/2013 1:23 PM, Martin Wilck wrote:
> On 04/03/2013 03:18 PM, Stan Hoeppner wrote:
> 
>> You didn't mention your stripe_cache_size value.  It'll make a lot of
>> difference.  Make sure it's at least 4096.  The default is 256.

Actually, the default is 128, not 256, at least with 3.2.6.  Not sure
about previous/later versions.

> I'm not getting it - why would stripe cache size matter in a random
> read/write test? 

It's very similar to the effect of a greater quantity of write back
cache on a hardware RAID controller.  Which is why it dramatically
affects write throughput but not read.  I believe the proper way to view
this is as a temporary workspace, where md can assemble the stripes to
be written out to the block layer, and store chunks which are read in
for RMW cycles.  As with many things in computing, increasing the size
of this working space allows the md driver to work more efficiently.
See below for exactly how it works.

> If the disks are large enough and the pattern is really
> random, the cache should hardly ever be hit (s_c_z = 4096 =^ 16MB cache
> per disk, that's 0.01% of disk size for a 160GB SSD).

You seem to be assuming the md "stripe cache" functions like some kind
of generic dumb filesystem cache.  It does not.

> I read that Peter confirmed the influence of stripe_cache_size, but I'd
> like to understand why it matters in this case.

If you think the throughput increase in this thread is impressive, see:
 http://marc.info/?l=linux-raid&m=136241443706663&w=2

About half way down there is a table showing the effects of
stripe_cache_size from 2048 to 32768.  Write throughput increased over
600MB/s, from 1018MB/s to 1628MB/s, simply by increasing
stripe_cache_size from 2048 to 4096, and decreased as the stripe cache
was made larger.  Thus every system has a sweet spot.  This was with 5
Intel 500GB SSDs w/the SandForce 2281 controller, attached to an LSI
9207-8i.  md/RAID5

I'd love to explain exactly how the stripe cache works, but to do that I
must first understand it.  And I've been unable to find documentation
describing the inner workings of the stripe cache.  And since I'm
neither a C nor kernel programmer, I can't look at the code and
understand it, nor then write a document for others.  So if you really
want that explanation you'll need to start another thread and bribe Neil
into explaining it.

-- 
Stan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 13:18 ` Stan Hoeppner
                     ` (2 preceding siblings ...)
  2013-04-03 18:23   ` Martin Wilck
@ 2013-04-03 19:56   ` Roy Sigurd Karlsbakk
  2013-04-03 21:12   ` Peter Landmann
  4 siblings, 0 replies; 17+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-04-03 19:56 UTC (permalink / raw)
  To: stan; +Cc: linux-raid, Peter Landmann

----- Opprinnelig melding -----
> On 4/3/2013 6:00 AM, Peter Landmann wrote:
> 
> You didn't mention your stripe_cache_size value. It'll make a lot of
> difference. Make sure it's at least 4096. The default is 256.

Looks like Documentation/md.txt (on 3.8.5) says stripe_cache_size, strip_cache_active and preread_bypass_threshold are only available for RAID-5. How can I tune RAID-6 like this?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RAID 5 doesn't scale
  2013-04-03 13:18 ` Stan Hoeppner
                     ` (3 preceding siblings ...)
  2013-04-03 19:56   ` Roy Sigurd Karlsbakk
@ 2013-04-03 21:12   ` Peter Landmann
  4 siblings, 0 replies; 17+ messages in thread
From: Peter Landmann @ 2013-04-03 21:12 UTC (permalink / raw)
  To: linux-raid


> Note also that you're doing 4KB random writes against RAID5.  This is
> going to generate substantial RMW cycles.  The Intel X25-M G2 is not a
> speed daemon.  Its published max 4KB IOPS throughput is for purely
> random writes, not the read+write pattern created by parity RMW.  So
> while your random read should get a nice jump with this test, your
> random write may not improve as much.  The limitation here is a function
> of the SSD controller on the X25-M G2, not md/RAID5.  If you test 5
> drives in md/RAID0 you'll see a bump in random write IOPS.
> 

It seems so. I let fio run a bit longer and in each settings the ssd-
performance decreased after few minutes. While mpstat still showed ~100% ssd-
utilization.





^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-04-04 13:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-03 11:00 RAID 5 doesn't scale Peter Landmann
2013-04-03 11:21 ` Benjamin ESTRABAUD
2013-04-03 18:34   ` Martin Wilck
2013-04-03 20:38     ` Peter Landmann
2013-04-04 13:40       ` Benjamin ESTRABAUD
2013-04-03 13:18 ` Stan Hoeppner
2013-04-03 15:23   ` keld
2013-04-03 15:31   ` Peter Landmann
2013-04-03 18:35     ` Stan Hoeppner
2013-04-03 18:23   ` Martin Wilck
2013-04-03 20:36     ` Peter Landmann
2013-04-03 21:19       ` Peter Landmann
2013-04-03 21:24       ` Stan Hoeppner
2013-04-03 21:29         ` Peter Landmann
2013-04-03 21:15     ` Stan Hoeppner
2013-04-03 19:56   ` Roy Sigurd Karlsbakk
2013-04-03 21:12   ` Peter Landmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.