Solving the raid write performance problems

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Solving the raid write performance problems
@ 2013-04-02 11:36 Peter Landmann
  2013-04-02 13:29 ` Hans-Peter Jansen
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Peter Landmann @ 2013-04-02 11:36 UTC (permalink / raw)
  To: linux-raid

Hi,

i'm a university student in end phase and considering to write my master thesis
about the md raid performance issues and to implement a prototype to solve it.

What i have done and know:
1. I wrote a (internal) paper to measure raid performance with SSDs with freebsd
software raid implementations and md raid under linux. I tested RAID 0 and RAID
5 with up to 6 Intel SSDs (X25-M G2, each 20k Write and 40k read OPS) and esp
for RAID 5 it doesn't scaled. With my fio and general environment (bs 4k,
iodepth 256, direct=1, randomwrite, spare capacity 87,5%, noop scheduler, latest
mainline kernel from git, amd phenom II 1055T 2,8 GHz, 8GB ram) i got
SSDs IOPS
3    14497.7
4    14005
5    17172.3
6    19779 

2. AFAIK the main problem is that md uses only one write thread for each raid
instance and their is a patch in work but still not availible.

So my questions:
1. Is this problem solved (i know it isn't in mainline)? Is there still some
work to do?
2. If not solved: Why isn't solved already (time? technical problem? priority?
Not solvable?)
3. Is it the only problem? With my tests i captured detailed cpu stats and no
cpu core was nearly at its capacity. So there are known other big reasons for
perfomance issues?
For example: 6 SSD randomwrite:
CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
all    1,17    0,00   12,67   12,71    3,27    3,05    0,00    0,00   67,13
0    1,41    0,00    7,88   15,42    0,07    0,15    0,00    0,00   75,07
1    0,00    0,00   38,04    3,14   19,20   18,08    0,00    0,00   21,54
2    1,50    0,00    7,55   14,78    0,07    0,02    0,00    0,00   76,08
3    1,09    0,00    7,31   12,15    0,05    0,02    0,00    0,00   79,38
4    1,35    0,00    7,41   12,94    0,07    0,00    0,00    0,00   78,23
5    1,65    0,00    7,78   17,84    0,12    0,03    0,00    0,00   72,57

4. Is this (bringing the raid performance to or near the theoretically
performance) a work that a man can archieve in less then 6 months without
practical experience in kernel hacking (and i'm not a genuis :( )

Thanks in advance for your responses,
Peter Landmann

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Solving the raid write performance problems
  2013-04-02 11:36 Solving the raid write performance problems Peter Landmann
@ 2013-04-02 13:29 ` Hans-Peter Jansen
  2013-04-02 16:10   ` Stan Hoeppner
  2013-04-02 14:14 ` keld
  2013-04-02 16:33 ` Piergiorgio Sartor
  2 siblings, 1 reply; 5+ messages in thread
From: Hans-Peter Jansen @ 2013-04-02 13:29 UTC (permalink / raw)
  To: Peter Landmann; +Cc: linux-raid

On Dienstag, 2. April 2013 11:36:13 Peter Landmann wrote:
> Hi,
> 
> i'm a university student in end phase and considering to write my master
> thesis about the md raid performance issues and to implement a prototype to
> solve it.
> 
> What i have done and know:
> 1. I wrote a (internal) paper to measure raid performance with SSDs with
> freebsd software raid implementations and md raid under linux. I tested
> RAID 0 and RAID 5 with up to 6 Intel SSDs (X25-M G2, each 20k Write and 40k
> read OPS) and esp for RAID 5 it doesn't scaled. With my fio and general
> environment (bs 4k, iodepth 256, direct=1, randomwrite, spare capacity
> 87,5%, noop scheduler, latest mainline kernel from git, amd phenom II 1055T
> 2,8 GHz, 8GB ram) i got SSDs IOPS
> 3    14497.7
> 4    14005
> 5    17172.3
> 6    19779
> 
> 2. AFAIK the main problem is that md uses only one write thread for each
> raid instance and their is a patch in work but still not availible.
> 
> So my questions:
> 1. Is this problem solved (i know it isn't in mainline)? Is there still some
> work to do?
> 2. If not solved: Why isn't solved already (time? technical problem?
> priority? Not solvable?)
> 3. Is it the only problem? With my tests i captured detailed cpu stats and
> no cpu core was nearly at its capacity. So there are known other big
> reasons for perfomance issues?
> For example: 6 SSD randomwrite:
> CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
> all    1,17    0,00   12,67   12,71    3,27    3,05    0,00    0,00   67,13
> 0    1,41    0,00    7,88   15,42    0,07    0,15    0,00    0,00   75,07
> 1    0,00    0,00   38,04    3,14   19,20   18,08    0,00    0,00   21,54
> 2    1,50    0,00    7,55   14,78    0,07    0,02    0,00    0,00   76,08
> 3    1,09    0,00    7,31   12,15    0,05    0,02    0,00    0,00   79,38
> 4    1,35    0,00    7,41   12,94    0,07    0,00    0,00    0,00   78,23
> 5    1,65    0,00    7,78   17,84    0,12    0,03    0,00    0,00   72,57
> 
> 4. Is this (bringing the raid performance to or near the theoretically
> performance) a work that a man can archieve in less then 6 months without
> practical experience in kernel hacking (and i'm not a genuis :( )

I would start with testing, what's already available. Check out Shaohua Li's 
<shli@kernel.org> post: raid5: create multiple threads to handle stripes
with patches, that are available for linux-next:

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=1ae2eeac074fa4511715d988c3fac95b338d00c0

Next, check, if you can address Stan Hoeppner's remarks from that thread.

Compare test runs with variation in # of cores and numa affinity, nothing 
else. Be careful about the SSD state regarding wear leveling. Provide 
performance comparison charts.

Given the already existing work done in this area, I would say, this is easily 
achievable within the given time frame, since only the automatic numa affinity 
adjustments are left as a new task.

Good luck.

Cheers,
Pete

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Solving the raid write performance problems
  2013-04-02 11:36 Solving the raid write performance problems Peter Landmann
  2013-04-02 13:29 ` Hans-Peter Jansen
@ 2013-04-02 14:14 ` keld
  2013-04-02 16:33 ` Piergiorgio Sartor
  2 siblings, 0 replies; 5+ messages in thread
From: keld @ 2013-04-02 14:14 UTC (permalink / raw)
  To: Peter Landmann; +Cc: linux-raid

On Tue, Apr 02, 2013 at 11:36:13AM +0000, Peter Landmann wrote:
> Hi,
> 
> i'm a university student in end phase and considering to write my master thesis
> about the md raid performance issues and to implement a prototype to solve it.
> 
> What i have done and know:
> 1. I wrote a (internal) paper to measure raid performance with SSDs with freebsd
> software raid implementations and md raid under linux. I tested RAID 0 and RAID
> 5 with up to 6 Intel SSDs (X25-M G2, each 20k Write and 40k read OPS) and esp
> for RAID 5 it doesn't scaled. With my fio and general environment (bs 4k,
> iodepth 256, direct=1, randomwrite, spare capacity 87,5%, noop scheduler, latest
> mainline kernel from git, amd phenom II 1055T 2,8 GHz, 8GB ram) i got
> SSDs IOPS
> 3    14497.7
> 4    14005
> 5    17172.3
> 6    19779 
> 
> 2. AFAIK the main problem is that md uses only one write thread for each raid
> instance and their is a patch in work but still not availible.
> 
> So my questions:
> 1. Is this problem solved (i know it isn't in mainline)? Is there still some
> work to do?
> 2. If not solved: Why isn't solved already (time? technical problem? priority?
> Not solvable?)
> 3. Is it the only problem? With my tests i captured detailed cpu stats and no
> cpu core was nearly at its capacity. So there are known other big reasons for
> perfomance issues?
> For example: 6 SSD randomwrite:
> CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
> all    1,17    0,00   12,67   12,71    3,27    3,05    0,00    0,00   67,13
> 0    1,41    0,00    7,88   15,42    0,07    0,15    0,00    0,00   75,07
> 1    0,00    0,00   38,04    3,14   19,20   18,08    0,00    0,00   21,54
> 2    1,50    0,00    7,55   14,78    0,07    0,02    0,00    0,00   76,08
> 3    1,09    0,00    7,31   12,15    0,05    0,02    0,00    0,00   79,38
> 4    1,35    0,00    7,41   12,94    0,07    0,00    0,00    0,00   78,23
> 5    1,65    0,00    7,78   17,84    0,12    0,03    0,00    0,00   72,57
> 
> 4. Is this (bringing the raid performance to or near the theoretically
> performance) a work that a man can archieve in less then 6 months without
> practical experience in kernel hacking (and i'm not a genuis :( )
> 
> Thanks in advance for your responses,
> Peter Landmann

Are you only investigating SSDs? Ore are you also looking at ordinary rotating disks?

I think you should also look at RAID1 raid6  and raid10 near,offset and far.

Maybe there are bottlenecks other places in the system. There is more on performance and
bottlenecks on the wiki.

best regards
Keld

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Solving the raid write performance problems
  2013-04-02 13:29 ` Hans-Peter Jansen
@ 2013-04-02 16:10   ` Stan Hoeppner
  0 siblings, 0 replies; 5+ messages in thread
From: Stan Hoeppner @ 2013-04-02 16:10 UTC (permalink / raw)
  To: Hans-Peter Jansen; +Cc: Peter Landmann, linux-raid

On 4/2/2013 8:29 AM, Hans-Peter Jansen wrote:

> Next, check, if you can address Stan Hoeppner's remarks from that thread.
> 
> Compare test runs with variation in # of cores and numa affinity, nothing 
> else. Be careful about the SSD state regarding wear leveling. Provide 
> performance comparison charts.

NUMA affinity is only applicable for multi-socket systems, or single
socket systems using a dual die MCM CPU, such as the AMD Magny Cours and
Interlagos.  Thus any work on NUMA optimization requires such a system,
or possibly a simulator.

-- 
Stan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Solving the raid write performance problems
  2013-04-02 11:36 Solving the raid write performance problems Peter Landmann
  2013-04-02 13:29 ` Hans-Peter Jansen
  2013-04-02 14:14 ` keld
@ 2013-04-02 16:33 ` Piergiorgio Sartor
  2 siblings, 0 replies; 5+ messages in thread
From: Piergiorgio Sartor @ 2013-04-02 16:33 UTC (permalink / raw)
  To: Peter Landmann; +Cc: linux-raid

On Tue, Apr 02, 2013 at 11:36:13AM +0000, Peter Landmann wrote:
> Hi,
> 
> i'm a university student in end phase and considering to write my master thesis
> about the md raid performance issues and to implement a prototype to solve it.
> 
> What i have done and know:
> 1. I wrote a (internal) paper to measure raid performance with SSDs with freebsd
> software raid implementations and md raid under linux. I tested RAID 0 and RAID
> 5 with up to 6 Intel SSDs (X25-M G2, each 20k Write and 40k read OPS) and esp
> for RAID 5 it doesn't scaled. With my fio and general environment (bs 4k,
> iodepth 256, direct=1, randomwrite, spare capacity 87,5%, noop scheduler, latest
> mainline kernel from git, amd phenom II 1055T 2,8 GHz, 8GB ram) i got
> SSDs IOPS
> 3    14497.7
> 4    14005
> 5    17172.3
> 6    19779 
> 
> 2. AFAIK the main problem is that md uses only one write thread for each raid
> instance and their is a patch in work but still not availible.
> 
> So my questions:
> 1. Is this problem solved (i know it isn't in mainline)? Is there still some
> work to do?
> 2. If not solved: Why isn't solved already (time? technical problem? priority?
> Not solvable?)
> 3. Is it the only problem? With my tests i captured detailed cpu stats and no
> cpu core was nearly at its capacity. So there are known other big reasons for
> perfomance issues?
> For example: 6 SSD randomwrite:
> CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
> all    1,17    0,00   12,67   12,71    3,27    3,05    0,00    0,00   67,13
> 0    1,41    0,00    7,88   15,42    0,07    0,15    0,00    0,00   75,07
> 1    0,00    0,00   38,04    3,14   19,20   18,08    0,00    0,00   21,54
> 2    1,50    0,00    7,55   14,78    0,07    0,02    0,00    0,00   76,08
> 3    1,09    0,00    7,31   12,15    0,05    0,02    0,00    0,00   79,38
> 4    1,35    0,00    7,41   12,94    0,07    0,00    0,00    0,00   78,23
> 5    1,65    0,00    7,78   17,84    0,12    0,03    0,00    0,00   72,57
> 
> 4. Is this (bringing the raid performance to or near the theoretically
> performance) a work that a man can archieve in less then 6 months without
> practical experience in kernel hacking (and i'm not a genuis :( )

Hi Peter,

I would not be so sure the issue (if any) is in the
multi threading.

Chunk size, stripe cache size, all make a difference
in terms of write performances.

I had a 4 HDDs (rotating) RAID-5, changing the stripe
cache from the default (256, I guess) to the max possible
(32768) made the writes scaling up linearly to the max
theoretical (4 HDDs) from the initial one (1 HDD).

Furthermore, the motherboard chipset controlling the 6
SSDs can be a bottleneck too.
I've seen huge differences between different types.

My suggestion would be to start with a baseline using
RAID-0 (same number of SSDs or one less than RAID-5)
and then see what's the maximum possible.

Hope this helps,

bye,

pg

> 
> Thanks in advance for your responses,
> Peter Landmann
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-04-02 16:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-02 11:36 Solving the raid write performance problems Peter Landmann
2013-04-02 13:29 ` Hans-Peter Jansen
2013-04-02 16:10   ` Stan Hoeppner
2013-04-02 14:14 ` keld
2013-04-02 16:33 ` Piergiorgio Sartor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).