* Raid performance problems (pdflush / raid5 eats 100%)
@ 2007-10-02 15:33 Goswin von Brederlow
2007-10-02 16:19 ` Justin Piszcz
0 siblings, 1 reply; 3+ messages in thread
From: Goswin von Brederlow @ 2007-10-02 15:33 UTC (permalink / raw)
To: linux-raid
Hi,
we (Q-Leap networks) are in the process of setting up a high speed
storage cluster and we are having some problems getting proper
performance.
Our test system consists of a 2x dual core system with 2 dual channel
UW scsi controlers connected to 2 external raid boxes and we use
iozone with 16GB data on an lustre (ldiskfs) filesystem as speed test
below. The raid boxes internally run raid6 and are split in 2
partitions, one maped to each scsi port. Read-Ahead is set to 32768.
sdb system controler 1: box 1 controler 1
sdc system controler 1: box 2 controler 1
sdd system controler 2: box 1 controler 2
sde system controler 2: box 2 controler 2
Plain disks: sdb1 sdc1 sdd1 sde1
--------------------------------
write rewrite read reread
1 Thread : 225204 269084 288718 288219
2 Threads: 401154 414525 441005 440564
3 Threads: 515818 528943 598863 599455
4 Threads: 587184 638971 737094 730850
raid1 [sdb1 sde1] [sdc1 sdd1] chunk=8192
----------------------------------------
write rewrite read reread
1 Thread : 179262 271810 293111 293593
2 Threads: 326260 345276 496189 498250
4 Threads: 333085 308820 686983 679123
8 Threads: 348458 277097 643260 673025
raid10 f2 [sdb1 sdc1 sdd1 sde1] chunk=8192
------------------------------------------
write rewrite read reread
1 Thread : 215560 323921 466460 436195
2 Threads: 288001 304094 611157 586583
4 Threads: 336072 298115 639925 662107
8 Threads: 243053 183969 665743 638512
As you can see adding an raid1 or raid10 layer already costs a certain
amount of performance. But all within reason. Now the real problem comes:
raid5 [sdb1 sdc1 sdd1 sde1] chunk=64, stripe_cache_size=32768
-----------------------------------------------------------------------
write rewrite read reread
1 Thread : 178540 176061 384928 384653
2 Threads: 218113 214308 379950 376312
4 Threads: 225560 160209 359628 359170
8 Threads: 232252 165669 261981 274043
The performance is totaly limited by pdflush (>80% cpu during write)
with md0_raid5 eating up a substantial percentage too.
raid5 [sdb1 sdc1 sdd1 sde1] chunk=8192, stripe_cache_size=32768
-----------------------------------------------------------
write rewrite read reread
1 Thread : 171138 185105 424504 428974
2 Threads: 165225 141431 553976 545088
4 Threads: 178189 110153 582999 581266
8 Threads: 177892 99679 568720 594580
This is even stranger. Now pdflush uses less cpu (10-70%) but
md0_raid5 is blocking with >95% cpu during write.
Three questions:
1) pdflush is limited to one thread per filesystem. For our useage
that is a bottleneck. Can anything be done there?
2) Why is read performance so lousy with small chunk size?
3) Why does raid5 take so much more cpu time on write with larger
chunk size? The amount of data to checksumm is the same (same speed)
but the cpu time used goes way up. There are no read-modify-write
cycles in there according to vmstat, plain continious writes.
MfG
Goswin
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Raid performance problems (pdflush / raid5 eats 100%)
2007-10-02 15:33 Raid performance problems (pdflush / raid5 eats 100%) Goswin von Brederlow
@ 2007-10-02 16:19 ` Justin Piszcz
2007-10-02 17:11 ` Goswin von Brederlow
0 siblings, 1 reply; 3+ messages in thread
From: Justin Piszcz @ 2007-10-02 16:19 UTC (permalink / raw)
To: Goswin von Brederlow; +Cc: linux-raid
On Tue, 2 Oct 2007, Goswin von Brederlow wrote:
> Hi,
>
> we (Q-Leap networks) are in the process of setting up a high speed
> storage cluster and we are having some problems getting proper
> performance.
>
> Our test system consists of a 2x dual core system with 2 dual channel
> UW scsi controlers connected to 2 external raid boxes and we use
> iozone with 16GB data on an lustre (ldiskfs) filesystem as speed test
> below. The raid boxes internally run raid6 and are split in 2
> partitions, one maped to each scsi port. Read-Ahead is set to 32768.
>
> sdb system controler 1: box 1 controler 1
> sdc system controler 1: box 2 controler 1
> sdd system controler 2: box 1 controler 2
> sde system controler 2: box 2 controler 2
>
> Plain disks: sdb1 sdc1 sdd1 sde1
> --------------------------------
> write rewrite read reread
> 1 Thread : 225204 269084 288718 288219
> 2 Threads: 401154 414525 441005 440564
> 3 Threads: 515818 528943 598863 599455
> 4 Threads: 587184 638971 737094 730850
>
> raid1 [sdb1 sde1] [sdc1 sdd1] chunk=8192
> ----------------------------------------
> write rewrite read reread
> 1 Thread : 179262 271810 293111 293593
> 2 Threads: 326260 345276 496189 498250
> 4 Threads: 333085 308820 686983 679123
> 8 Threads: 348458 277097 643260 673025
>
> raid10 f2 [sdb1 sdc1 sdd1 sde1] chunk=8192
> ------------------------------------------
> write rewrite read reread
> 1 Thread : 215560 323921 466460 436195
> 2 Threads: 288001 304094 611157 586583
> 4 Threads: 336072 298115 639925 662107
> 8 Threads: 243053 183969 665743 638512
>
>
> As you can see adding an raid1 or raid10 layer already costs a certain
> amount of performance. But all within reason. Now the real problem comes:
>
>
> raid5 [sdb1 sdc1 sdd1 sde1] chunk=64, stripe_cache_size=32768
> -----------------------------------------------------------------------
> write rewrite read reread
> 1 Thread : 178540 176061 384928 384653
> 2 Threads: 218113 214308 379950 376312
> 4 Threads: 225560 160209 359628 359170
> 8 Threads: 232252 165669 261981 274043
>
> The performance is totaly limited by pdflush (>80% cpu during write)
> with md0_raid5 eating up a substantial percentage too.
>
>
> raid5 [sdb1 sdc1 sdd1 sde1] chunk=8192, stripe_cache_size=32768
> -----------------------------------------------------------
> write rewrite read reread
> 1 Thread : 171138 185105 424504 428974
> 2 Threads: 165225 141431 553976 545088
> 4 Threads: 178189 110153 582999 581266
> 8 Threads: 177892 99679 568720 594580
>
> This is even stranger. Now pdflush uses less cpu (10-70%) but
> md0_raid5 is blocking with >95% cpu during write.
>
>
>
> Three questions:
>
> 1) pdflush is limited to one thread per filesystem. For our useage
> that is a bottleneck. Can anything be done there?
>
> 2) Why is read performance so lousy with small chunk size?
>
> 3) Why does raid5 take so much more cpu time on write with larger
> chunk size? The amount of data to checksumm is the same (same speed)
> but the cpu time used goes way up. There are no read-modify-write
> cycles in there according to vmstat, plain continious writes.
>
> MfG
> Goswin
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Have you tried a 1024k stripe and 16384k stripe_cache_size?
I'd be curious what kind of performance/write speed you get with that
configuration.
Justin.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Raid performance problems (pdflush / raid5 eats 100%)
2007-10-02 16:19 ` Justin Piszcz
@ 2007-10-02 17:11 ` Goswin von Brederlow
0 siblings, 0 replies; 3+ messages in thread
From: Goswin von Brederlow @ 2007-10-02 17:11 UTC (permalink / raw)
To: Justin Piszcz; +Cc: Goswin von Brederlow, linux-raid
Justin Piszcz <jpiszcz@lucidpixels.com> writes:
> Have you tried a 1024k stripe and 16384k stripe_cache_size?
>
> I'd be curious what kind of performance/write speed you get with that
> configuration.
>
> Justin.
stripe_cache_size is not in KiB of memory but in multiples of some
internal structures. So 16384 * 4 raids is ~1.6GB ram. We only have
2GB ram in the system so this is very near an OOM just for the stripe
cache:
Mem: 2062468k total, 1735048k used, 327420k free, 4096k buffer
Swap: 0k total, 0k used, 0k free, 9924k cached
1 Thread:
9442 root 16 0 0 0 0 R 73 0.0 3:33.89 pdflush
9590 root 10 -5 0 0 0 S 57 0.0 1:29.73 md0_raid5
9747 root 25 0 28364 1392 184 R 85 0.1 1:06.09 iozone
257 root 10 -5 0 0 0 D 23 0.0 4:47.50 kswapd1
256 root 10 -5 0 0 0 S 12 0.0 4:11.17 kswapd0
2 Threads:
257 root 11 -5 0 0 0 D 27 0.0 4:55.46 kswapd1
9442 root 16 0 0 0 0 S 23 0.0 4:49.65 pdflush
256 root 10 -5 0 0 0 S 9 0.0 4:26.57 kswapd0
9590 root 10 -5 0 0 0 R 40 0.0 2:36.91 md0_raid5
9596 root 10 -5 0 0 0 S 35 0.0 0:50.62 md1_raid5
9487 root 15 0 0 0 0 S 35 0.0 0:37.14 pdflush
9759 root 18 0 28360 1420 216 R 59 0.1 0:11.61 iozone
9758 root 18 0 28360 1420 216 R 70 0.1 0:10.17 iozone
With 4 raid5s:
write rewrite read reread
1 Thread : 193887 196092 365965 401978 <-- use 1 raid5
2 Threads: 221221 255069 430061 365904 <-- use 2 raid5
4 Threads: 238220 257221 395603 409954 <-- use 4 raid5
With just 1 raid5 (and more free mem):
2 Threads:
9590 root 10 -5 0 0 0 R 93 0.0 7:45.04 md0_raid5
9487 root 16 0 0 0 0 S 27 0.0 5:13.43 pdflush
257 root 10 -5 0 0 0 S 5 0.0 6:28.48 kswapd1
256 root 10 -5 0 0 0 S 5 0.0 5:33.40 kswapd0
9818 root 18 0 28364 1416 208 R 56 0.1 0:47.70 iozone
9819 root 18 0 28364 1416 208 D 62 0.1 0:46.58 iozone
write rewrite read reread
1 Thread : 206299 201120 400244 390185
2 Threads: 210888 202785 411502 400532
4 Threads: 200829 145307 400252 395568
MfG
Goswin
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-10-02 17:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-02 15:33 Raid performance problems (pdflush / raid5 eats 100%) Goswin von Brederlow
2007-10-02 16:19 ` Justin Piszcz
2007-10-02 17:11 ` Goswin von Brederlow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).