Bonnie++ with 1024k stripe SW/RAID5 causes kernel to goto D-state

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Bonnie++ with 1024k stripe SW/RAID5 causes kernel to goto D-state
@ 2007-09-29 17:08 Justin Piszcz
  2007-09-29 18:33 ` Chris Snook
  0 siblings, 1 reply; 2+ messages in thread
From: Justin Piszcz @ 2007-09-29 17:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-raid, xfs

Kernel: 2.6.23-rc8 (older kernels do this as well)

When running the following command:
/usr/bin/time /usr/sbin/bonnie++ -d /x/test -s 16384 -m p34 -n 16:100000:16:64

It hangs unless I increase various parameters md/raid such as the 
stripe_cache_size etc..

# ps auxww | grep D
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       276  0.0  0.0      0     0 ?        D    12:14   0:00 [pdflush]
root       277  0.0  0.0      0     0 ?        D    12:14   0:00 [pdflush]
root      1639  0.0  0.0      0     0 ?        D<   12:14   0:00 [xfsbufd]
root      1767  0.0  0.0   8100   420 ?        Ds   12:14   0:00 
root      2895  0.0  0.0   5916   632 ?        Ds   12:15   0:00 /sbin/syslogd -r

See the bottom for more details.

Is this normal?  Does md only work without tuning up to a certain stripe 
size? I use a RAID 5 with 1024k stripe which works fine with many 
optimizations, but if I just boot the system and run bonnie++ on it 
without applying the optimizations, it will hang in d-state.  When I run 
the optimizations, then it exits out of D-state, pretty weird?

(again, without this, bonnie++ will hang in d-state.. until this is run)

Optimization script:

#!/bin/bash

# source profile
. /etc/profile

# Tell user what's going on.
echo "Optimizing RAID Arrays..."

# Define DISKS.
cd /sys/block
DISKS=$(/bin/ls -1d sd[a-z])

# This step must come first.
# See: http://www.3ware.com/KB/article.aspx?id=11050
echo "Setting max_sectors_kb to 128 KiB"
for i in $DISKS
do
   echo "Setting /dev/$i to 128 KiB..."
   echo 128 > /sys/block/"$i"/queue/max_sectors_kb
done

# This step comes next.
echo "Setting nr_requests to 512 KiB"
for i in $DISKS
do
   echo "Setting /dev/$i to 512K KiB"
   echo 512 > /sys/block/"$i"/queue/nr_requests
done

# Set read-ahead.
echo "Setting read-ahead to 64 MiB for /dev/md3"
blockdev --setra 65536 /dev/md3

# Set stripe-cache_size for RAID5.
echo "Setting stripe_cache_size to 16 MiB for /dev/md3"
echo 16384 > /sys/block/md3/md/stripe_cache_size

# Set minimum and maximum raid rebuild speed to 30MB/s.
echo "Setting minimum and maximum resync speed to 30 MiB/s..."
echo 30000 > /sys/block/md0/md/sync_speed_min
echo 30000 > /sys/block/md0/md/sync_speed_max
echo 30000 > /sys/block/md1/md/sync_speed_min
echo 30000 > /sys/block/md1/md/sync_speed_max
echo 30000 > /sys/block/md2/md/sync_speed_min
echo 30000 > /sys/block/md2/md/sync_speed_max
echo 30000 > /sys/block/md3/md/sync_speed_min
echo 30000 > /sys/block/md3/md/sync_speed_max

# Disable NCQ on all disks.
echo "Disabling NCQ on all disks..."
for i in $DISKS
do
   echo "Disabling NCQ on $i"
   echo 1 > /sys/block/"$i"/device/queue_depth
done

--

Once this runs, everything works fine again.

--

# mdadm -D /dev/md3
/dev/md3:
         Version : 00.90.03
   Creation Time : Wed Aug 22 10:38:53 2007
      Raid Level : raid5
      Array Size : 1318680576 (1257.59 GiB 1350.33 GB)
   Used Dev Size : 146520064 (139.73 GiB 150.04 GB)
    Raid Devices : 10
   Total Devices : 10
Preferred Minor : 3
     Persistence : Superblock is persistent

     Update Time : Sat Sep 29 13:05:15 2007
           State : active, resyncing
  Active Devices : 10
Working Devices : 10
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 1024K

  Rebuild Status : 8% complete

            UUID : e37a12d1:1b0b989a:083fb634:68e9eb49 (local to host p34.internal.lan)
          Events : 0.4211

     Number   Major   Minor   RaidDevice State
        0       8       33        0      active sync   /dev/sdc1
        1       8       49        1      active sync   /dev/sdd1
        2       8       65        2      active sync   /dev/sde1
        3       8       81        3      active sync   /dev/sdf1
        4       8       97        4      active sync   /dev/sdg1
        5       8      113        5      active sync   /dev/sdh1
        6       8      129        6      active sync   /dev/sdi1
        7       8      145        7      active sync   /dev/sdj1
        8       8      161        8      active sync   /dev/sdk1
        9       8      177        9      active sync   /dev/sdl1

--

NOTE: This bug is reproducible every time:

Example:

$ /usr/bin/time /usr/sbin/bonnie++ -d /x/test -s 16384 -m p34 -n 
16:100000:16:64
Writing with putc()...

It writes for 4-5 minutes and then...... SILENCE + D-STATE, I was too late 
this time :(

$ ps auxww | grep D
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       276  1.2  0.0      0     0 ?        D    12:50   0:03 [pdflush]
root      2901  0.0  0.0   5916   632 ?        Ds   12:50   0:00 /sbin/syslogd -
r
user       4571 48.0  0.0  11644  1084 pts/1    D+   12:51   1:55 /usr/sbin/bonn
ie++ -d /x/test -s 16384 -m p34 -n 16:100000:16:64
root      4612  1.0  0.0      0     0 ?        D    12:52   0:01 [pdflush]
root      4624  5.0  0.0  40964  7436 ?        D    12:55   0:00 /usr/bin/perl -
w /app/rrd-cputemp/bin/rrd_cputemp.pl
root      4684  0.0  0.0  31968  1416 ?        D    12:55   0:00 /usr/bin/rateup
  /var/www/monitor/mrtg/ eth0 1191084902 -Z u 265975 843609 125000000 c #00cc00 #
0000ff #006600 #ff00ff k 1000 i /var/www/monitor/mrtg/eth0-day.png -125000000 -1
25000000 400 100 1 1 1 300 0 4 1 %Y-%m-%d %H:%M 0 i /var/www/monitor/mrtg/eth0-w
eek.png -125000000 -125000000 400 100 1 1 1 1800 0 4 1 %Y-%m-%d %H:%M 0 i /var/w
ww/monitor/mrtg/eth0-month.png -125000000 -125000000 400 100 1 1 1 7200 0 4 1 %Y
-%m-%d %H:%M 0
root      4686  0.0  0.0   4420   932 ?        D    12:55   0:00 /usr/sbin/hddte
mp -n /dev/sdf
user       4688  0.0  0.0   4232   800 pts/5    S+   12:55   0:00 grep --color D
$

If you are not logged as root already, it is sometimes too late to su to root
and run the optimizations:

$ su -
Password: 
<hang forever>

Justin.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Bonnie++ with 1024k stripe SW/RAID5 causes kernel to goto D-state
  2007-09-29 17:08 Bonnie++ with 1024k stripe SW/RAID5 causes kernel to goto D-state Justin Piszcz
@ 2007-09-29 18:33 ` Chris Snook
  0 siblings, 0 replies; 2+ messages in thread
From: Chris Snook @ 2007-09-29 18:33 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, linux-raid, xfs

Justin Piszcz wrote:
> Kernel: 2.6.23-rc8 (older kernels do this as well)
> 
> When running the following command:
> /usr/bin/time /usr/sbin/bonnie++ -d /x/test -s 16384 -m p34 -n 
> 16:100000:16:64
> 
> It hangs unless I increase various parameters md/raid such as the 
> stripe_cache_size etc..
> 
> # ps auxww | grep D
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> root       276  0.0  0.0      0     0 ?        D    12:14   0:00 [pdflush]
> root       277  0.0  0.0      0     0 ?        D    12:14   0:00 [pdflush]
> root      1639  0.0  0.0      0     0 ?        D<   12:14   0:00 [xfsbufd]
> root      1767  0.0  0.0   8100   420 ?        Ds   12:14   0:00 
> root      2895  0.0  0.0   5916   632 ?        Ds   12:15   0:00 
> /sbin/syslogd -r
> 
> See the bottom for more details.
> 
> Is this normal?  Does md only work without tuning up to a certain stripe 
> size? I use a RAID 5 with 1024k stripe which works fine with many 
> optimizations, but if I just boot the system and run bonnie++ on it 
> without applying the optimizations, it will hang in d-state.  When I run 
> the optimizations, then it exits out of D-state, pretty weird?

Not at all.  1024k stripes are way outside the norm.  If you do something way 
outside the norm, and don't tune for it in advance, don't be terribly surprised 
when something like bonnie++ brings your box to its knees.

That's not to say we couldn't make md auto-tune itself more intelligently, but 
this isn't really a bug.  With a sufficiently huge amount of RAM, you'd be able 
to dynamically allocate the buffers that you're not pre-allocating with 
stripe_cache_size, but bonnie++ is eating that up in this case.

	-- Chris

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-09-29 18:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-29 17:08 Bonnie++ with 1024k stripe SW/RAID5 causes kernel to goto D-state Justin Piszcz
2007-09-29 18:33 ` Chris Snook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).