Raid performance problems (pdflush / raid5 eats 100%)

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Raid performance problems (pdflush / raid5 eats 100%)
@ 2007-10-02 15:33 Goswin von Brederlow
  2007-10-02 16:19 ` Justin Piszcz
  0 siblings, 1 reply; 3+ messages in thread
From: Goswin von Brederlow @ 2007-10-02 15:33 UTC (permalink / raw)
  To: linux-raid

Hi,

we (Q-Leap networks) are in the process of setting up a high speed
storage cluster and we are having some problems getting proper
performance.

Our test system consists of a 2x dual core system with 2 dual channel
UW scsi controlers connected to 2 external raid boxes and we use
iozone with 16GB data on an lustre (ldiskfs) filesystem as speed test
below. The raid boxes internally run raid6 and are split in 2
partitions, one maped to each scsi port. Read-Ahead is set to 32768.

sdb system controler 1: box 1 controler 1
sdc system controler 1: box 2 controler 1
sdd system controler 2: box 1 controler 2
sde system controler 2: box 2 controler 2

Plain disks: sdb1 sdc1 sdd1 sde1
--------------------------------
           write  rewrite read   reread
1 Thread : 225204 269084  288718 288219
2 Threads: 401154 414525  441005 440564
3 Threads: 515818 528943  598863 599455
4 Threads: 587184 638971  737094 730850

raid1 [sdb1 sde1] [sdc1 sdd1] chunk=8192
----------------------------------------
           write  rewrite read   reread
1 Thread : 179262 271810  293111 293593
2 Threads: 326260 345276  496189 498250
4 Threads: 333085 308820  686983 679123
8 Threads: 348458 277097  643260 673025

raid10 f2 [sdb1 sdc1 sdd1 sde1] chunk=8192
------------------------------------------
           write  rewrite read   reread
1 Thread : 215560 323921  466460 436195
2 Threads: 288001 304094  611157 586583
4 Threads: 336072 298115  639925 662107
8 Threads: 243053 183969  665743 638512

As you can see adding an raid1 or raid10 layer already costs a certain
amount of performance. But all within reason. Now the real problem comes:

raid5 [sdb1 sdc1 sdd1 sde1] chunk=64, stripe_cache_size=32768
-----------------------------------------------------------------------
           write  rewrite read   reread
1 Thread : 178540 176061  384928 384653
2 Threads: 218113 214308  379950 376312
4 Threads: 225560 160209  359628 359170
8 Threads: 232252 165669  261981 274043

The performance is totaly limited by pdflush (>80% cpu during write)
with md0_raid5 eating up a substantial percentage too.

raid5 [sdb1 sdc1 sdd1 sde1] chunk=8192, stripe_cache_size=32768
-----------------------------------------------------------
           write  rewrite read   reread
1 Thread : 171138 185105  424504 428974
2 Threads: 165225 141431  553976 545088
4 Threads: 178189 110153  582999 581266
8 Threads: 177892  99679  568720 594580

This is even stranger. Now pdflush uses less cpu (10-70%) but
md0_raid5 is blocking with >95% cpu during write.

Three questions:

1) pdflush is limited to one thread per filesystem. For our useage
that is a bottleneck. Can anything be done there?

2) Why is read performance so lousy with small chunk size?

3) Why does raid5 take so much more cpu time on write with larger
chunk size? The amount of data to checksumm is the same (same speed)
but the cpu time used goes way up. There are no read-modify-write
cycles in there according to vmstat, plain continious writes.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Raid performance problems (pdflush / raid5 eats 100%)
  2007-10-02 15:33 Raid performance problems (pdflush / raid5 eats 100%) Goswin von Brederlow
@ 2007-10-02 16:19 ` Justin Piszcz
  2007-10-02 17:11   ` Goswin von Brederlow
  0 siblings, 1 reply; 3+ messages in thread
From: Justin Piszcz @ 2007-10-02 16:19 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: linux-raid



On Tue, 2 Oct 2007, Goswin von Brederlow wrote:

> Hi,
>
> we (Q-Leap networks) are in the process of setting up a high speed
> storage cluster and we are having some problems getting proper
> performance.
>
> Our test system consists of a 2x dual core system with 2 dual channel
> UW scsi controlers connected to 2 external raid boxes and we use
> iozone with 16GB data on an lustre (ldiskfs) filesystem as speed test
> below. The raid boxes internally run raid6 and are split in 2
> partitions, one maped to each scsi port. Read-Ahead is set to 32768.
>
> sdb system controler 1: box 1 controler 1
> sdc system controler 1: box 2 controler 1
> sdd system controler 2: box 1 controler 2
> sde system controler 2: box 2 controler 2
>
> Plain disks: sdb1 sdc1 sdd1 sde1
> --------------------------------
>           write  rewrite read   reread
> 1 Thread : 225204 269084  288718 288219
> 2 Threads: 401154 414525  441005 440564
> 3 Threads: 515818 528943  598863 599455
> 4 Threads: 587184 638971  737094 730850
>
> raid1 [sdb1 sde1] [sdc1 sdd1] chunk=8192
> ----------------------------------------
>           write  rewrite read   reread
> 1 Thread : 179262 271810  293111 293593
> 2 Threads: 326260 345276  496189 498250
> 4 Threads: 333085 308820  686983 679123
> 8 Threads: 348458 277097  643260 673025
>
> raid10 f2 [sdb1 sdc1 sdd1 sde1] chunk=8192
> ------------------------------------------
>           write  rewrite read   reread
> 1 Thread : 215560 323921  466460 436195
> 2 Threads: 288001 304094  611157 586583
> 4 Threads: 336072 298115  639925 662107
> 8 Threads: 243053 183969  665743 638512
>
>
> As you can see adding an raid1 or raid10 layer already costs a certain
> amount of performance. But all within reason. Now the real problem comes:
>
>
> raid5 [sdb1 sdc1 sdd1 sde1] chunk=64, stripe_cache_size=32768
> -----------------------------------------------------------------------
>           write  rewrite read   reread
> 1 Thread : 178540 176061  384928 384653
> 2 Threads: 218113 214308  379950 376312
> 4 Threads: 225560 160209  359628 359170
> 8 Threads: 232252 165669  261981 274043
>
> The performance is totaly limited by pdflush (>80% cpu during write)
> with md0_raid5 eating up a substantial percentage too.
>
>
> raid5 [sdb1 sdc1 sdd1 sde1] chunk=8192, stripe_cache_size=32768
> -----------------------------------------------------------
>           write  rewrite read   reread
> 1 Thread : 171138 185105  424504 428974
> 2 Threads: 165225 141431  553976 545088
> 4 Threads: 178189 110153  582999 581266
> 8 Threads: 177892  99679  568720 594580
>
> This is even stranger. Now pdflush uses less cpu (10-70%) but
> md0_raid5 is blocking with >95% cpu during write.
>
>
>
> Three questions:
>
> 1) pdflush is limited to one thread per filesystem. For our useage
> that is a bottleneck. Can anything be done there?
>
> 2) Why is read performance so lousy with small chunk size?
>
> 3) Why does raid5 take so much more cpu time on write with larger
> chunk size? The amount of data to checksumm is the same (same speed)
> but the cpu time used goes way up. There are no read-modify-write
> cycles in there according to vmstat, plain continious writes.
>
> MfG
>        Goswin
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Have you tried a 1024k stripe and 16384k stripe_cache_size?

I'd be curious what kind of performance/write speed you get with that 
configuration.

Justin.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Raid performance problems (pdflush / raid5 eats 100%)
  2007-10-02 16:19 ` Justin Piszcz
@ 2007-10-02 17:11   ` Goswin von Brederlow
  0 siblings, 0 replies; 3+ messages in thread
From: Goswin von Brederlow @ 2007-10-02 17:11 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Goswin von Brederlow, linux-raid

Justin Piszcz <jpiszcz@lucidpixels.com> writes:

> Have you tried a 1024k stripe and 16384k stripe_cache_size?
>
> I'd be curious what kind of performance/write speed you get with that
> configuration.
>
> Justin.

stripe_cache_size is not in KiB of memory but in multiples of some
internal structures. So 16384 * 4 raids is ~1.6GB ram. We only have
2GB ram in the system so this is very near an OOM just for the stripe
cache:

Mem:   2062468k total,  1735048k used,   327420k free,     4096k buffer
Swap:        0k total,        0k used,        0k free,     9924k cached

1 Thread:
 9442 root      16   0     0    0    0 R   73  0.0   3:33.89 pdflush
 9590 root      10  -5     0    0    0 S   57  0.0   1:29.73 md0_raid5
 9747 root      25   0 28364 1392  184 R   85  0.1   1:06.09 iozone
  257 root      10  -5     0    0    0 D   23  0.0   4:47.50 kswapd1
  256 root      10  -5     0    0    0 S   12  0.0   4:11.17 kswapd0

2 Threads:
  257 root      11  -5     0    0    0 D   27  0.0   4:55.46 kswapd1
 9442 root      16   0     0    0    0 S   23  0.0   4:49.65 pdflush
  256 root      10  -5     0    0    0 S    9  0.0   4:26.57 kswapd0
 9590 root      10  -5     0    0    0 R   40  0.0   2:36.91 md0_raid5
 9596 root      10  -5     0    0    0 S   35  0.0   0:50.62 md1_raid5
 9487 root      15   0     0    0    0 S   35  0.0   0:37.14 pdflush
 9759 root      18   0 28360 1420  216 R   59  0.1   0:11.61 iozone
 9758 root      18   0 28360 1420  216 R   70  0.1   0:10.17 iozone

With 4 raid5s:
           write  rewrite read   reread
1 Thread : 193887 196092  365965 401978  <-- use 1 raid5
2 Threads: 221221 255069  430061 365904  <-- use 2 raid5
4 Threads: 238220 257221  395603 409954  <-- use 4 raid5


With just 1 raid5 (and more free mem):

2 Threads:
 9590 root      10  -5     0    0    0 R   93  0.0   7:45.04 md0_raid5
 9487 root      16   0     0    0    0 S   27  0.0   5:13.43 pdflush
  257 root      10  -5     0    0    0 S    5  0.0   6:28.48 kswapd1
  256 root      10  -5     0    0    0 S    5  0.0   5:33.40 kswapd0
 9818 root      18   0 28364 1416  208 R   56  0.1   0:47.70 iozone
 9819 root      18   0 28364 1416  208 D   62  0.1   0:46.58 iozone
         
           write  rewrite read   reread
1 Thread : 206299 201120  400244 390185
2 Threads: 210888 202785  411502 400532
4 Threads: 200829 145307  400252 395568

MfG
        Goswin

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-10-02 17:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-02 15:33 Raid performance problems (pdflush / raid5 eats 100%) Goswin von Brederlow
2007-10-02 16:19 ` Justin Piszcz
2007-10-02 17:11   ` Goswin von Brederlow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).