linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jim Duchek <jim.duchek@gmail.com>
To: Mark Knecht <markknecht@gmail.com>
Cc: Roger Heflin <rogerheflin@gmail.com>, linux-raid@vger.kernel.org
Subject: Re: Array 'freezes' for some time after large writes?
Date: Wed, 31 Mar 2010 10:25:29 -0600	[thread overview]
Message-ID: <s2xdead81ad1003310925z4bec9e0cod5f39421f9d06299@mail.gmail.com> (raw)
In-Reply-To: <p2m5bdc1c8b1003310912i7215b7abh88989839a76feb84@mail.gmail.com>

Agreed, playing with some of these settings appear to clear the
problem up, for at least the cases in which I tend to trigger it.
Much obliged for the help!

Jim


On 31 March 2010 10:12, Mark Knecht <markknecht@gmail.com> wrote:
> On Tue, Mar 30, 2010 at 6:35 PM, Roger Heflin <rogerheflin@gmail.com> wrote:
>> Jim Duchek wrote:
>>>
>>> Hi all.  Regularly after a large write to the disk (untarring a very
>>> large file, etc), my RAID5 will 'freeze' for a period of time --
>>> perhaps around a minute.  My system is completely responsive otherwise
>>> during this time, with the exception of anything that is attempting to
>>> read or write from the array -- it's as if any file descriptors simply
>>> block.
> <SNIP>
>>
>> In /etc/sysctl.conf or with "sysctl -a|grep vm.dirty" check these two
>> settings:
>> vm.dirty_background_ratio 5
>> vm.dirty_ratio = 6
>>
>> Default will be something like 40 for the second one and 10 for the first
>> on.
>>
>> 40% is how much memory the kernel lets get dirty with write data, 10% or
>> whatever the bottom number is, is once it starts cleaning it up how low it
>> has to go before letting anyone else write again (ie freeze all writes and
>> massively slow down reads)
>>
>> I set the values to the above, in older kernels 5 is the min value, newer
>> ones may allow lower, I don't believe it is well documented what the limits
>> are, and if you set it lower the older kernels silently set the value to the
>> min internally in the kernel, you won't see it on sysctl -a check.   So on
>> my machine I could freeze for how long it takes to write 1% of memory out to
>> disk, which with 8GB is 81MB which takes at most a second or 2 at
>> 60mb/second or so.  If you have 8G and have the difference between the two
>> set to 10% it can take 10+ seconds, I don't remember the default, but the
>> large it is the bigger the freeze will be.
>>
>> And these depends on the underlying disk speed, if the underlying disk is
>> slower the time it takes to write out that amount of data is larger and
>> things are uglier, and file copies do a good job of causing this.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> Very interesting Roger. Thanks.
>
> I did some reading on a couple of web site and then did some testing.
> I found for the sort of jobs I do that create and write data, as an
> example compiling and installing MythTV, these settings have a big
> effect on the percentage of time my system drops into these 100%wa, 0%
> CPU type of states. The default setting on my system was 10/20 and
> that tended to create this state quite a lot. 3/40 reduced it by
> probably 50-75%, while 3/70 seemed to eliminate it until the end of
> the build where the kernel/compiler is presumably forcing it out to
> disk because the job is finishing.
>
> One page I read mentioned data centers using a very good UPS and
> internal power supply and then running it at 1/100. I think the basic
> idea is that if we lose power there should be enough time to flush all
> this stuff to disk before the power completely drops out but up until
> that time let the kernel take care of things completely.
>
> Experimentally what I see is that when I cross above the lower value
> it isn't that nothing gets written, but more that the kernel sort of
> opportunistically starts writing it to disk without letting it get too
> much in the way of running programs, and then when the higher value
> seems to get crossed the system goes 100% wait while it pushes the
> data out and is waiting for the disk. I used the command
>
> grep -A 1 dirty /proc/vmstat
>
> to watch a compile taking place and looked when it was 100%
> user/system and then also when it went to 100% wait.
>
> Some additional reading seems to suggest tuning things like
>
> vm.overcommit_ratio
>
> and possibly changing the I/O scheduler
>
> keeper ~ # cat /sys/block/sda/queue/scheduler
> noop deadline [cfq]
>
> or changing the number of requests
>
> keeper ~ # cat /sys/block/sda/queue/nr_requests
> 128
>
> or read ahead values
>
> keeper ~ # blockdev --getra /dev/sda
> 256
>
> I haven't played with any of those.
>
> Based on this info I think it's worth my time trying a new RAID
> install and see if I'm more successful.
>
> Thanks very much for your insights and help!
>
> Cheers,
> Mark
>
>
>
> keeper ~ # vi /etc/sysctl.conf
>
> vm.dirty_background_ratio = 10
> vm.dirty_ratio = 20
>
> keeper ~ # sysctl -p
>
> real    8m50.667s
> user    30m6.995s
> sys     1m30.605s
> keeper ~ #
>
>
> keeper ~ # vi /etc/sysctl.conf
>
> vm.dirty_background_ratio = 3
> vm.dirty_ratio = 40
>
> keeper ~ # sysctl -p
>
> keeper ~ # time emerge -DuN mythtv
> <SNIP>
> real    8m59.401s
> user    30m9.980s
> sys     1m30.303s
> keeper ~ #
>
>
> keeper ~ # vi /etc/sysctl.conf
>
> vm.dirty_background_ratio = 3
> vm.dirty_ratio = 70
>
> keeper ~ # time emerge -DuN mythtv
> <SNIP>
> real    8m52.272s
> user    30m0.889s
> sys     1m30.609s
> keeper ~ #keeper ~ # vi /etc/sysctl.conf
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-03-31 16:25 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-30 17:07 Array 'freezes' for some time after large writes? Jim Duchek
2010-03-30 17:18 ` Mark Knecht
2010-03-30 17:47   ` Jim Duchek
2010-03-30 18:00     ` Mark Knecht
2010-03-30 18:05     ` Mark Knecht
2010-03-30 20:32       ` Jim Duchek
2010-03-30 20:45         ` Mark Knecht
2010-03-30 20:59           ` Jim Duchek
2010-03-30 22:21             ` Mark Knecht
2010-03-30 23:50               ` Mark Knecht
2010-03-31  0:22                 ` Jim Duchek
2010-03-31  1:35 ` Roger Heflin
2010-03-31 16:12   ` Mark Knecht
2010-03-31 16:25     ` Jim Duchek [this message]
2010-03-31 16:37 ` Asdo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=s2xdead81ad1003310925z4bec9e0cod5f39421f9d06299@mail.gmail.com \
    --to=jim.duchek@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=markknecht@gmail.com \
    --cc=rogerheflin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).