All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: lilofile <lilofile@aliyun.com>,
	Linux RAID <linux-raid@vger.kernel.org>,
	Shaohua Li <shli@kernel.org>
Subject: Re: md raid5 performace 6x SSD RAID5
Date: Sun, 01 Dec 2013 23:51:36 -0600	[thread overview]
Message-ID: <529C1FE8.90806@hardwarefreak.com> (raw)
In-Reply-To: <2cd32253-a361-4c98-a7b4-f373a9959c39@aliyun.com>

On 12/1/2013 9:48 PM, lilofile wrote:
> #1 will eventually be addressed with a multi-thread patch to the various RAID drivers including RAID5
> 
> what is the differences between the multi-thread patch and the CONFIG_MULTICORE_RAID456?

I can't find the original description for that option, but I can tell
you that:

1.  It was experimental
2.  Neil Brown requested its complete removal from git in March 2013:

http://permalink.gmane.org/gmane.linux.kernel.commits.head/372527

> my understanding is CONFIG_MULTICORE_RAID456
>  enum {
> 	STRIPE_OP_BIOFILL,
> 	STRIPE_OP_COMPUTE_BLK,
> 	STRIPE_OP_PREXOR,
> 	STRIPE_OP_BIODRAIN,
> 	STRIPE_OP_RECONSTRUCT,
> 	STRIPE_OP_CHECK,
> };  this operations  in a stripe can be schedule to other CPU to run,
> 
> while  multi-thread patch  mainly modify lock contention of thread, this understanding is correct? 

Shaohua Li has been working on multi-threaded md drivers to fix the CPU
bottleneck with SSD storage for some time now.  He's currently focusing
on raid5.c.  See:
http://lwn.net/Articles/500200/
http://www.spinics.net/lists/raid/msg44699.html

AFAIK this work is not yet fully completed nor thoroughly tested, nor
included in a stable release.  Shaohua, could you give us a quick update
on the status of your RAID5 multi-thread work?  Demand for it seems to
be steeply increasing recently, this current thread, and another last
week with slow RAID10 on the new hybrid SSD/rust drives.

> ------------------------------------------------------------------
> 发件人:lilofile <lilofile@aliyun.com>
> 发送时间:2013年11月28日(星期四) 19:54
> 收件人:stan <stan@hardwarefreak.com>; Linux RAID <linux-raid@vger.kernel.org>
> 主 题:答复:答复:md raid5 performace 6x SSD RAID5
> 
> I have change stripe cache size from   4096 stripe cache to  8192, the test result show the performance improve <5%, maybe The effect is not very obvious。

IIRC, this was before you started testing with FIO.  I'd really like to
see your streaming read/write results of FIO with the command line I
gave you, for each of these 3 stripe_cache_size values.  BTW, you don't
need to set a timer.  The size=30G limits the test to 30GB.  I chose
this value because the test runs should only take 15s at this size.  Go
any smaller and it makes capturing accurate data more difficult.

The reason for running the streaming tests is that it eliminates the RMW
code path and any associated latencies you get with the random write
test.  The command line I gave you should give us an idea of the peak
streaming read/write throughput of your SSD RAID5 array with the only
limitation being single core performance.

To discover how much CPU is being burned, concurrently with each FIO
test, execute the following as well once FIO initialization is complete
and the actual read/write tests begin.  This will show us what your CPU
consumption looks like and if you're hitting the single core ceiling
with the md write thread.  This will give you 20 seconds of CPU stats
polled every .5s:

~# top -b -n 40 -d 0.5 |grep Cpu|mawk '{print ($1,$3,$4) }'

This will generate a lot of output.  Piping through mawk will clean this
up making it easier to see which CPU is running the md write thread
during your write tests.  The FIO threads will execute in user space,
the md write thread in system space.  You won't see one core peaking
during read tests as any/all CPUs may be used.

Which kernel version are you using?  I don't recall you saying.  With
later kernels IIRC the parity calculations are offloaded to another
thread, so you may see high load on two cores.

> ------------------------------------------------------------------
> 发件人:Stan Hoeppner <stan@hardwarefreak.com>
> 发送时间:2013年11月28日(星期四) 12:41
> 收件人:lilofile <lilofile@aliyun.com>; Linux RAID <linux-raid@vger.kernel.org>
> 主 题:Re: 答复:md raid5 performace 6x SSD RAID5
> 
> On 11/27/2013 7:51 AM, lilofile wrote:
>> additional: CPU: Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
>>                 memory:32GB
> ...
>> when I create raid5 which use six SSD(sTEC s840),
>> when the stripe_cache_size is set 4096. 
>> root@host1:/sys/block/md126/md# cat /proc/mdstat 
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
>> md126 : active raid5 sdg[6] sdf[4] sde[3] sdd[2] sdc[1] sdb[0]
>>       3906404480 blocks super 1.2 level 5, 128k chunk, algorithm 2 [6/6] [UUUUUU]
>>
>> the single ssd read/write performance :
>>
>> root@host1:~# dd if=/dev/sdb of=/dev/zero count=100000 bs=1M
>> ^C76120+0 records in
>> 76119+0 records out
>> 79816556544 bytes (80 GB) copied, 208.278 s, 383 MB/s
>>
>> root@host1:~# dd of=/dev/sdb if=/dev/zero count=100000 bs=1M
>> 100000+0 records in
>> 100000+0 records out
>> 104857600000 bytes (105 GB) copied, 232.943 s, 450 MB/s
>>
>> the raid read and write performance is  approx 1.8GB/s read and 1.1GB/s write performance
>> root@sc0:/sys/block/md126/md# dd if=/dev/zero of=/dev/md126 count=100000 bs=1M
>> 100000+0 records in
>> 100000+0 records out
>> 104857600000 bytes (105 GB) copied, 94.2039 s, 1.1 GB/s
>>
>>
>> root@sc0:/sys/block/md126/md# dd of=/dev/zero if=/dev/md126 count=100000 bs=1M
>> 100000+0 records in
>> 100000+0 records out
>> 104857600000 bytes (105 GB) copied, 59.5551 s, 1.8 GB/s
>>
>> why the performance is so bad?  especially the write performace.
> 
> There are 3 things that could be, or are, limiting performance here.
> 
> 1.  The RAID5 write thread peaks one CPU core as it is single threaded
> 2.  A 4KB stripe cache is too small for 6 SSDs, try 8KB
> 3.  dd issues IOs serially and will thus never saturate the hardware
> 
> #1 will eventually be addressed with a multi-thread patch to the various
> RAID drivers including RAID5.  There is no workaround at this time.
> 
> To address #3 use FIO or a similar testing tool that can issue IOs in
> parallel.  With SSD based storage you will never reach maximum
> throughput with a serial data stream.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2013-12-02  5:51 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-22 11:13 ARC-1120 and MD very sloooow Jimmy Thrasibule
2013-11-22 11:17 ` Mikael Abrahamsson
2013-11-22 20:17 ` Stan Hoeppner
2013-11-22 20:17   ` Stan Hoeppner
2013-11-25  8:56   ` Jimmy Thrasibule
2013-11-26  0:45     ` Stan Hoeppner
2013-11-26  0:45       ` Stan Hoeppner
2013-11-26  2:52       ` Dave Chinner
2013-11-26  2:52         ` Dave Chinner
2013-11-26  3:58         ` Stan Hoeppner
2013-11-26  3:58           ` Stan Hoeppner
2013-11-26  6:14           ` Dave Chinner
2013-11-26  8:03             ` Stan Hoeppner
2013-11-26  8:03               ` Stan Hoeppner
2013-11-28 15:59               ` Jimmy Thrasibule
2013-11-28 15:59                 ` Jimmy Thrasibule
2013-11-28 19:59                 ` Stan Hoeppner
2013-11-27 13:48             ` md raid5 performace 6x SSD RAID5 lilofile
2013-11-27 13:51             ` 答复:md " lilofile
2013-11-28  4:41               ` Stan Hoeppner
2013-11-28  4:46                 ` Roman Mamedov
2013-11-28  6:24                   ` Stan Hoeppner
2013-11-28 10:02               ` 答复:答复:md " lilofile
2013-11-29  2:38                 ` Stan Hoeppner
2013-11-29  6:23                   ` Stan Hoeppner
2013-11-30 14:12                 ` 答复:答复:答复:md raid5 random " lilofile
2013-12-01 14:14                   ` Stan Hoeppner
2013-12-01 16:33                   ` md " lilofile
2013-12-02  2:37                     ` Stan Hoeppner
2013-11-28 11:54               ` 答复:答复:md raid5 " lilofile
2013-12-02  3:48               ` md " lilofile
2013-12-02  5:51                 ` Stan Hoeppner [this message]
2014-09-23  3:34               ` raid sync speed lilofile
2014-09-23  5:11               ` behind_writes lilofile

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=529C1FE8.90806@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=lilofile@aliyun.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.