linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Reading takes 100% precedence over writes for mdadm+raid5?
@ 2007-12-02 23:09 Justin Piszcz
  2007-12-02 23:18 ` Neil Brown
  2007-12-06 16:11 ` Bill Davidsen
  0 siblings, 2 replies; 10+ messages in thread
From: Justin Piszcz @ 2007-12-02 23:09 UTC (permalink / raw)
  To: linux-raid

root      2206     1  4 Dec02 ?        00:10:37 dd if /dev/zero of 1.out 
bs 1M
root      2207     1  4 Dec02 ?        00:10:38 dd if /dev/zero of 2.out 
bs 1M
root      2208     1  4 Dec02 ?        00:10:35 dd if /dev/zero of 3.out 
bs 1M
root      2209     1  4 Dec02 ?        00:10:45 dd if /dev/zero of 4.out 
bs 1M
root      2210     1  4 Dec02 ?        00:10:35 dd if /dev/zero of 5.out 
bs 1M
root      2211     1  4 Dec02 ?        00:10:35 dd if /dev/zero of 6.out 
bs 1M
root      2212     1  4 Dec02 ?        00:10:30 dd if /dev/zero of 7.out 
bs 1M
root      2213     1  4 Dec02 ?        00:10:42 dd if /dev/zero of 8.out 
bs 1M
root      2214     1  4 Dec02 ?        00:10:35 dd if /dev/zero of 9.out 
bs 1M
root      2215     1  4 Dec02 ?        00:10:37 dd if /dev/zero of 10.out 
bs 1M
root      3080 24.6  0.0  10356  1672 ?        D    01:22   5:51 dd if 
/dev/md3 of /dev/null bs 1M

Was curious if when running 10 DD's (which are writing to the RAID 5) 
fine, no issues, suddenly all go into D-state and let the read/give it 
100% priority?

Is this normal?

# du -sb . ; sleep 300; du -sb .
1115590287487   .
1115590287487   .

Here my my raid5 config:

# mdadm -D /dev/md3
/dev/md3:
         Version : 00.90.03
   Creation Time : Sun Dec  2 12:15:20 2007
      Raid Level : raid5
      Array Size : 1465143296 (1397.27 GiB 1500.31 GB)
   Used Dev Size : 732571648 (698.63 GiB 750.15 GB)
    Raid Devices : 3
   Total Devices : 3
Preferred Minor : 3
     Persistence : Superblock is persistent

     Update Time : Sun Dec  2 22:00:54 2007
           State : active
  Active Devices : 3
Working Devices : 3
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 1024K

            UUID : fea48e85:ddd2c33f:d19da839:74e9c858 (local to host box1)
          Events : 0.15

     Number   Major   Minor   RaidDevice State
        0       8       33        0      active sync   /dev/sdc1
        1       8       49        1      active sync   /dev/sdd1
        2       8       65        2      active sync   /dev/sde1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reading takes 100% precedence over writes for mdadm+raid5?
  2007-12-02 23:09 Reading takes 100% precedence over writes for mdadm+raid5? Justin Piszcz
@ 2007-12-02 23:18 ` Neil Brown
  2007-12-02 23:26   ` Justin Piszcz
  2007-12-06 16:11 ` Bill Davidsen
  1 sibling, 1 reply; 10+ messages in thread
From: Neil Brown @ 2007-12-02 23:18 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid

On Sunday December 2, jpiszcz@lucidpixels.com wrote:
> 
> Was curious if when running 10 DD's (which are writing to the RAID 5) 
> fine, no issues, suddenly all go into D-state and let the read/give it 
> 100% priority?

So are you saying that the writes completely stalled while the read
was progressing?  How exactly did you measure that?

What kernel version are you running.

> 
> Is this normal?

It shouldn't be.

NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reading takes 100% precedence over writes for mdadm+raid5?
  2007-12-02 23:18 ` Neil Brown
@ 2007-12-02 23:26   ` Justin Piszcz
  2007-12-06  1:27     ` Jon Nelson
  0 siblings, 1 reply; 10+ messages in thread
From: Justin Piszcz @ 2007-12-02 23:26 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid



On Mon, 3 Dec 2007, Neil Brown wrote:

> On Sunday December 2, jpiszcz@lucidpixels.com wrote:
>>
>> Was curious if when running 10 DD's (which are writing to the RAID 5)
>> fine, no issues, suddenly all go into D-state and let the read/give it
>> 100% priority?
>
> So are you saying that the writes completely stalled while the read
> was progressing?  How exactly did you measure that?
Yes, 100%.

>
> What kernel version are you running.
2.6.23.9

>
>>
>> Is this normal?
>
> It shouldn't be.
>
> NeilBrown
>

I checked again with du -sb while it is writing, it is, just just VERY 
slowly:

Before reading dd:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
  1  3    104  46088      8 7669416    0    0     0 102832 2683 21132  0 34 43 22
  0  2    104  49140      8 7666724    0    0     0 137800 2662 6690  0 30 45 25
  0  4    104  47344      8 7668884    0    0     0 93312 2637 19454  0 22 40 38
  0  6    104  51292      8 7664688    0    0     0 89404 2538 7901  0 18 31 51
  0  1    104  55476      8 7660424    0    0     0 172852 2669 13607  0 39 47 14
  0  3    104  50428      8 7665036    0    0     0 135916 2711 22523  0 27 52 22
  0  5    104  51836      8 7664152    0    0     0 101504 2491 2784  0 18 42 40
  0  5    104 113468      8 7603016    0    0     0 63788 2568 7528  0 24 24 52
  0  2    104  45780      8 7669364    0    0  1116 177604 2617 13521  0 34 33 33

After reading dd launched:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
  2  4    104  45076 2379348 5273588    0    0  7584 17753  548  301  0 17 45 39
  0  5    104  46632 2617352 5043116    0    0 237908     0 2949 2647  0 10 35 54
  1  5    104  45656 2846728 4814900    0    0 229376     0 2768 2360  0 10 36 54
  1  4    104  46128 3104932 4551408    0    0 258308  2748 2918 2559  0 11 36 53
  0  5    104  43804 3338248 4323996    0    0 233212     0 2815 2631  0 10 33 57
  0  5    104  46580 3534856 4125848    0    0 196608     0 2736 2273  0  9 36 55
  0  5    104  46164 3797000 3862936    0    0 262144  1396 2900 2834  0 11 37 51
  1  4    104  46076 4026376 3633740    0    0 229376     0 2978 2586  0 11 37 53
  0  5    104  46252 4288520 3371724    0    0 262144     0 2878 2316  0 11 37 53
  0  5    104  46520 4517896 3142376    0    0 229440     0 2912 2406  0 10 35 56
  0  5    104  47408 4747272 2913156    0    0 229376     0 2903 2619  0 10 36 54
  1  4    104  46800 4976648 2683560    0    0 229376     0 2726 2346  0 10 37 53
  0  5    104  45284 5206024 2456248    0    0 229376     0 2856 2482  0 10 36 54
  0  5    104  46524 5468168 2192136    0    0 262144     0 2956 2750  0 11 36 54
  0  5    104  47284 5697544 1962556    0    0 229376     0 2894 2589  0 10 37 53

Takes awhile before it writes anything..

l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r#

.. 5 minutes later ..

l1:/r# du -sb .
1251764138111   .
l1:/r#

l1:/r# du -sb .
1251885887615   .
l1:/r#

l1:/r# ps auxww | grep dd
root      2206  4.5  0.0  10356  1672 ?        D    Dec02  11:46 dd if /dev/zero of 1.out bs 1M
root      2207  4.5  0.0  10356  1672 ?        D    Dec02  11:47 dd if /dev/zero of 2.out bs 1M
root      2208  4.4  0.0  10356  1676 ?        D    Dec02  11:42 dd if /dev/zero of 3.out bs 1M
root      2209  4.5  0.0  10356  1676 ?        D    Dec02  11:53 dd if /dev/zero of 4.out bs 1M
root      2210  4.4  0.0  10356  1672 ?        D    Dec02  11:43 dd if /dev/zero of 5.out bs 1M
root      2211  4.4  0.0  10356  1676 ?        D    Dec02  11:43 dd if /dev/zero of 6.out bs 1M
root      2212  4.4  0.0  10356  1676 ?        D    Dec02  11:38 dd if /dev/zero of 7.out bs 1M
root      2213  4.5  0.0  10356  1672 ?        D    Dec02  11:50 dd if /dev/zero of 8.out bs 1M
root      2214  4.5  0.0  10356  1672 ?        D    Dec02  11:47 dd if /dev/zero of 9.out bs 1M
root      2215  4.4  0.0  10356  1676 ?        D    Dec02  11:44 dd if /dev/zero of 10.out bs 1M
root      3251 25.0  0.0  10356  1676 pts/2    D    02:21   0:14 dd if /dev/md3 of /dev/null bs 1M
root      3282  0.0  0.0   5172   780 pts/2    S+   02:22   0:00 grep dd
l1:/r#

HP raid controllers (CCISS) allow a pct(%) utilization for read/write, 
does Linux/mdadm's implementation offer this anywhere in a sys or proc 
tunable?

Justin.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reading takes 100% precedence over writes for mdadm+raid5?
  2007-12-02 23:26   ` Justin Piszcz
@ 2007-12-06  1:27     ` Jon Nelson
  2007-12-06  9:06       ` Justin Piszcz
  0 siblings, 1 reply; 10+ messages in thread
From: Jon Nelson @ 2007-12-06  1:27 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Neil Brown, linux-raid

I saw something really similar while moving some very large (300MB to
4GB) files.
I was really surprised to see actual disk I/O (as measured by dstat)
be really horrible.

-- 
Jon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reading takes 100% precedence over writes for mdadm+raid5?
  2007-12-06  1:27     ` Jon Nelson
@ 2007-12-06  9:06       ` Justin Piszcz
  2007-12-06  9:26         ` David Rees
  0 siblings, 1 reply; 10+ messages in thread
From: Justin Piszcz @ 2007-12-06  9:06 UTC (permalink / raw)
  To: Jon Nelson; +Cc: Neil Brown, linux-raid



On Wed, 5 Dec 2007, Jon Nelson wrote:

> I saw something really similar while moving some very large (300MB to
> 4GB) files.
> I was really surprised to see actual disk I/O (as measured by dstat)
> be really horrible.
>
> -- 
> Jon
>

Any work-arounds, or just don't perform heavy reads the same time as 
writes?

Justin.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reading takes 100% precedence over writes for mdadm+raid5?
  2007-12-06  9:06       ` Justin Piszcz
@ 2007-12-06  9:26         ` David Rees
  2007-12-06  9:27           ` Justin Piszcz
  2007-12-06 13:43           ` Jon Nelson
  0 siblings, 2 replies; 10+ messages in thread
From: David Rees @ 2007-12-06  9:26 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Jon Nelson, Neil Brown, linux-raid

On Dec 6, 2007 1:06 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> On Wed, 5 Dec 2007, Jon Nelson wrote:
>
> > I saw something really similar while moving some very large (300MB to
> > 4GB) files.
> > I was really surprised to see actual disk I/O (as measured by dstat)
> > be really horrible.
>
> Any work-arounds, or just don't perform heavy reads the same time as
> writes?

What kernel are you using? (Did I miss it in your OP?)

The per-device write throttling in 2.6.24 should help significantly,
have you tried the latest -rc and compared to your current kernel?

-Dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reading takes 100% precedence over writes for mdadm+raid5?
  2007-12-06  9:26         ` David Rees
@ 2007-12-06  9:27           ` Justin Piszcz
  2007-12-06 13:43           ` Jon Nelson
  1 sibling, 0 replies; 10+ messages in thread
From: Justin Piszcz @ 2007-12-06  9:27 UTC (permalink / raw)
  To: David Rees; +Cc: Jon Nelson, Neil Brown, linux-raid



On Thu, 6 Dec 2007, David Rees wrote:

> On Dec 6, 2007 1:06 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>> On Wed, 5 Dec 2007, Jon Nelson wrote:
>>
>>> I saw something really similar while moving some very large (300MB to
>>> 4GB) files.
>>> I was really surprised to see actual disk I/O (as measured by dstat)
>>> be really horrible.
>>
>> Any work-arounds, or just don't perform heavy reads the same time as
>> writes?
>
> What kernel are you using? (Did I miss it in your OP?)
>
> The per-device write throttling in 2.6.24 should help significantly,
> have you tried the latest -rc and compared to your current kernel?
>
> -Dave
>

2.6.23.9-- thanks will try out the latest -rc or wait for 2.6.24!

Justin.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reading takes 100% precedence over writes for mdadm+raid5?
  2007-12-06  9:26         ` David Rees
  2007-12-06  9:27           ` Justin Piszcz
@ 2007-12-06 13:43           ` Jon Nelson
  1 sibling, 0 replies; 10+ messages in thread
From: Jon Nelson @ 2007-12-06 13:43 UTC (permalink / raw)
  To: David Rees; +Cc: Justin Piszcz, Neil Brown, linux-raid

On 12/6/07, David Rees <drees76@gmail.com> wrote:
> On Dec 6, 2007 1:06 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> > On Wed, 5 Dec 2007, Jon Nelson wrote:
> >
> > > I saw something really similar while moving some very large (300MB to
> > > 4GB) files.
> > > I was really surprised to see actual disk I/O (as measured by dstat)
> > > be really horrible.
> >
> > Any work-arounds, or just don't perform heavy reads the same time as
> > writes?
>
> What kernel are you using? (Did I miss it in your OP?)
>
> The per-device write throttling in 2.6.24 should help significantly,
> have you tried the latest -rc and compared to your current kernel?

I was using 2.6.22.12 I think (openSUSE kernel).
I can try using pretty much any kernel - I'm preparing to do an
unrelated test using 2.6.24rc4 this weekend. If I remember I'll try to
see what disk I/O looks like there.


-- 
Jon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reading takes 100% precedence over writes for mdadm+raid5?
  2007-12-02 23:09 Reading takes 100% precedence over writes for mdadm+raid5? Justin Piszcz
  2007-12-02 23:18 ` Neil Brown
@ 2007-12-06 16:11 ` Bill Davidsen
  2007-12-09  0:13   ` Jon Nelson
  1 sibling, 1 reply; 10+ messages in thread
From: Bill Davidsen @ 2007-12-06 16:11 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid

Justin Piszcz wrote:
> root      2206     1  4 Dec02 ?        00:10:37 dd if /dev/zero of 
> 1.out bs 1M
> root      2207     1  4 Dec02 ?        00:10:38 dd if /dev/zero of 
> 2.out bs 1M
> root      2208     1  4 Dec02 ?        00:10:35 dd if /dev/zero of 
> 3.out bs 1M
> root      2209     1  4 Dec02 ?        00:10:45 dd if /dev/zero of 
> 4.out bs 1M
> root      2210     1  4 Dec02 ?        00:10:35 dd if /dev/zero of 
> 5.out bs 1M
> root      2211     1  4 Dec02 ?        00:10:35 dd if /dev/zero of 
> 6.out bs 1M
> root      2212     1  4 Dec02 ?        00:10:30 dd if /dev/zero of 
> 7.out bs 1M
> root      2213     1  4 Dec02 ?        00:10:42 dd if /dev/zero of 
> 8.out bs 1M
> root      2214     1  4 Dec02 ?        00:10:35 dd if /dev/zero of 
> 9.out bs 1M
> root      2215     1  4 Dec02 ?        00:10:37 dd if /dev/zero of 
> 10.out bs 1M
> root      3080 24.6  0.0  10356  1672 ?        D    01:22   5:51 dd if 
> /dev/md3 of /dev/null bs 1M
>
> Was curious if when running 10 DD's (which are writing to the RAID 5) 
> fine, no issues, suddenly all go into D-state and let the read/give it 
> 100% priority?
>
> Is this normal?

I'm jumping back to the start of this thread, because after reading all 
the discussion I noticed that you are mixing apples and oranges here. 
Your write programs are going to files in the filesystem, and your read 
is going against the raw device. That may explain why you see something 
I haven't noticed doing all filesystem i/o.

I am going to do a large rsync to another filesystem in the next two 
days, I will turn on some measurements when I do. But if you are just 
investigating this behavior, perhaps you could retry with a single read 
from a file rather than the device.

    [...snip...]

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reading takes 100% precedence over writes for mdadm+raid5?
  2007-12-06 16:11 ` Bill Davidsen
@ 2007-12-09  0:13   ` Jon Nelson
  0 siblings, 0 replies; 10+ messages in thread
From: Jon Nelson @ 2007-12-09  0:13 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Justin Piszcz, linux-raid

This is what dstat shows me copying lots of large files about (ext3),
one file at a time.
I've benchmarked the raid itself around 65-70 MB/s maximum actual
write I/O so this 3-4MB/s stuff is pretty bad.

I should note that ALL other I/O suffers horribly, even on other filesystems.
What might the cause be?

I should note: going larger in stripe_cache_size (384 and 512)
performance stays the same, going smaller (128) performance
*increases* and stays more steady to 10-13 MB/s.


----total-cpu-usage---- --dsk/sda-- --dsk/sdb-- --dsk/sdc--
--dsk/sdd-- -dsk/total->
usr sys idl wai hiq siq| read  writ: read  writ: read  writ: read
writ: read  writ>
  1   1  95   3   0   0|  12k 4261B: 106k  125k:  83k  110k:  83k
110k: 283k  348k>
  0   5   0  91   1   2|   0     0 :2384k 4744k:2612k 4412k:2336k
4804k:7332k   14M>
  0   4   0  91   1   3|   0     0 :2352k 4964k:2392k 4812k:2620k
4764k:7364k   14M>
  0   4   0  92   1   3|   0     0 :1068k 3524k:1336k 3184k:1360k
2912k:3764k 9620k>
  0   4   0  92   1   2|   0     0 :2304k 2612k:2128k 2484k:2332k
3028k:6764k 8124k>
  0   4   0  92   1   2|   0     0 :1584k 3428k:1252k 3992k:1592k
3416k:4428k   11M>
  0   3   0  93   0   2|   0     0 :1400k 2364k:1424k 2700k:1584k
2592k:4408k 7656k>
  0   4   0  93   1   2|   0     0 :1764k 3084k:1820k 2972k:1796k
2396k:5380k 8452k>
  0   4   0  92   2   3|   0     0 :1984k 3736k:1772k 4024k:1792k
4524k:5548k   12M>
  0   4   0  93   1   2|   0     0 :1852k 3860k:1840k 3408k:1696k
3648k:5388k   11M>
  0   4   0  93   0   2|   0     0 :1328k 2500k:1640k 2348k:1672k
2128k:4640k 6976k>
  0   4   0  92   0   4|   0     0 :1624k 3944k:2080k 3432k:1760k
3704k:5464k   11M>
  0   1   0  97   1   2|   0     0 :1480k 1340k: 976k 1564k:1268k
1488k:3724k 4392k>
  0   4   0  92   1   2|   0     0 :1320k 2676k:1608k 2548k: 968k
2572k:3896k 7796k>
  0   2   0  96   1   1|   0     0 :1856k 1808k:1752k 1988k:1752k
1600k:5360k 5396k>
  0   4   0  92   2   1|   0     0 :1360k 2560k:1240k 2788k:1580k
2940k:4180k 8288k>
  0   2   0  97   1   2|   0     0 :1928k 1456k:1628k 2080k:1488k
2308k:5044k 5844k>
  1   3   0  94   2   2|   0     0 :1432k 2156k:1320k 1840k: 936k
1072k:3688k 5068k>
  0   3   0  93   2   2|   0     0 :1760k 2164k:1440k 2384k:1276k
2972k:4476k 7520k>
  0   3   0  95   1   2|   0     0 :1088k 1064k: 896k 1424k:1152k
992k:3136k 3480k>
  0   0   0  96   0   2|   0     0 : 976k  888k: 632k 1120k:1016k
968k:2624k 2976k>
  0   2   0  94   1   2|   0     0 :1120k 1864k: 964k 1776k:1060k
1856k:3144k 5496k>

-- 
Jon

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-12-09  0:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-02 23:09 Reading takes 100% precedence over writes for mdadm+raid5? Justin Piszcz
2007-12-02 23:18 ` Neil Brown
2007-12-02 23:26   ` Justin Piszcz
2007-12-06  1:27     ` Jon Nelson
2007-12-06  9:06       ` Justin Piszcz
2007-12-06  9:26         ` David Rees
2007-12-06  9:27           ` Justin Piszcz
2007-12-06 13:43           ` Jon Nelson
2007-12-06 16:11 ` Bill Davidsen
2007-12-09  0:13   ` Jon Nelson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).