From: Jim Duchek <jim.duchek@gmail.com>
To: Mark Knecht <markknecht@gmail.com>
Cc: Roger Heflin <rogerheflin@gmail.com>, linux-raid@vger.kernel.org
Subject: Re: Array 'freezes' for some time after large writes?
Date: Wed, 31 Mar 2010 10:25:29 -0600 [thread overview]
Message-ID: <s2xdead81ad1003310925z4bec9e0cod5f39421f9d06299@mail.gmail.com> (raw)
In-Reply-To: <p2m5bdc1c8b1003310912i7215b7abh88989839a76feb84@mail.gmail.com>
Agreed, playing with some of these settings appear to clear the
problem up, for at least the cases in which I tend to trigger it.
Much obliged for the help!
Jim
On 31 March 2010 10:12, Mark Knecht <markknecht@gmail.com> wrote:
> On Tue, Mar 30, 2010 at 6:35 PM, Roger Heflin <rogerheflin@gmail.com> wrote:
>> Jim Duchek wrote:
>>>
>>> Hi all. Regularly after a large write to the disk (untarring a very
>>> large file, etc), my RAID5 will 'freeze' for a period of time --
>>> perhaps around a minute. My system is completely responsive otherwise
>>> during this time, with the exception of anything that is attempting to
>>> read or write from the array -- it's as if any file descriptors simply
>>> block.
> <SNIP>
>>
>> In /etc/sysctl.conf or with "sysctl -a|grep vm.dirty" check these two
>> settings:
>> vm.dirty_background_ratio 5
>> vm.dirty_ratio = 6
>>
>> Default will be something like 40 for the second one and 10 for the first
>> on.
>>
>> 40% is how much memory the kernel lets get dirty with write data, 10% or
>> whatever the bottom number is, is once it starts cleaning it up how low it
>> has to go before letting anyone else write again (ie freeze all writes and
>> massively slow down reads)
>>
>> I set the values to the above, in older kernels 5 is the min value, newer
>> ones may allow lower, I don't believe it is well documented what the limits
>> are, and if you set it lower the older kernels silently set the value to the
>> min internally in the kernel, you won't see it on sysctl -a check. So on
>> my machine I could freeze for how long it takes to write 1% of memory out to
>> disk, which with 8GB is 81MB which takes at most a second or 2 at
>> 60mb/second or so. If you have 8G and have the difference between the two
>> set to 10% it can take 10+ seconds, I don't remember the default, but the
>> large it is the bigger the freeze will be.
>>
>> And these depends on the underlying disk speed, if the underlying disk is
>> slower the time it takes to write out that amount of data is larger and
>> things are uglier, and file copies do a good job of causing this.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> Very interesting Roger. Thanks.
>
> I did some reading on a couple of web site and then did some testing.
> I found for the sort of jobs I do that create and write data, as an
> example compiling and installing MythTV, these settings have a big
> effect on the percentage of time my system drops into these 100%wa, 0%
> CPU type of states. The default setting on my system was 10/20 and
> that tended to create this state quite a lot. 3/40 reduced it by
> probably 50-75%, while 3/70 seemed to eliminate it until the end of
> the build where the kernel/compiler is presumably forcing it out to
> disk because the job is finishing.
>
> One page I read mentioned data centers using a very good UPS and
> internal power supply and then running it at 1/100. I think the basic
> idea is that if we lose power there should be enough time to flush all
> this stuff to disk before the power completely drops out but up until
> that time let the kernel take care of things completely.
>
> Experimentally what I see is that when I cross above the lower value
> it isn't that nothing gets written, but more that the kernel sort of
> opportunistically starts writing it to disk without letting it get too
> much in the way of running programs, and then when the higher value
> seems to get crossed the system goes 100% wait while it pushes the
> data out and is waiting for the disk. I used the command
>
> grep -A 1 dirty /proc/vmstat
>
> to watch a compile taking place and looked when it was 100%
> user/system and then also when it went to 100% wait.
>
> Some additional reading seems to suggest tuning things like
>
> vm.overcommit_ratio
>
> and possibly changing the I/O scheduler
>
> keeper ~ # cat /sys/block/sda/queue/scheduler
> noop deadline [cfq]
>
> or changing the number of requests
>
> keeper ~ # cat /sys/block/sda/queue/nr_requests
> 128
>
> or read ahead values
>
> keeper ~ # blockdev --getra /dev/sda
> 256
>
> I haven't played with any of those.
>
> Based on this info I think it's worth my time trying a new RAID
> install and see if I'm more successful.
>
> Thanks very much for your insights and help!
>
> Cheers,
> Mark
>
>
>
> keeper ~ # vi /etc/sysctl.conf
>
> vm.dirty_background_ratio = 10
> vm.dirty_ratio = 20
>
> keeper ~ # sysctl -p
>
> real 8m50.667s
> user 30m6.995s
> sys 1m30.605s
> keeper ~ #
>
>
> keeper ~ # vi /etc/sysctl.conf
>
> vm.dirty_background_ratio = 3
> vm.dirty_ratio = 40
>
> keeper ~ # sysctl -p
>
> keeper ~ # time emerge -DuN mythtv
> <SNIP>
> real 8m59.401s
> user 30m9.980s
> sys 1m30.303s
> keeper ~ #
>
>
> keeper ~ # vi /etc/sysctl.conf
>
> vm.dirty_background_ratio = 3
> vm.dirty_ratio = 70
>
> keeper ~ # time emerge -DuN mythtv
> <SNIP>
> real 8m52.272s
> user 30m0.889s
> sys 1m30.609s
> keeper ~ #keeper ~ # vi /etc/sysctl.conf
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-03-31 16:25 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-30 17:07 Array 'freezes' for some time after large writes? Jim Duchek
2010-03-30 17:18 ` Mark Knecht
2010-03-30 17:47 ` Jim Duchek
2010-03-30 18:00 ` Mark Knecht
2010-03-30 18:05 ` Mark Knecht
2010-03-30 20:32 ` Jim Duchek
2010-03-30 20:45 ` Mark Knecht
2010-03-30 20:59 ` Jim Duchek
2010-03-30 22:21 ` Mark Knecht
2010-03-30 23:50 ` Mark Knecht
2010-03-31 0:22 ` Jim Duchek
2010-03-31 1:35 ` Roger Heflin
2010-03-31 16:12 ` Mark Knecht
2010-03-31 16:25 ` Jim Duchek [this message]
2010-03-31 16:37 ` Asdo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=s2xdead81ad1003310925z4bec9e0cod5f39421f9d06299@mail.gmail.com \
--to=jim.duchek@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=markknecht@gmail.com \
--cc=rogerheflin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).