* Re: [PATCH 00/23] per device dirty throttling -v8
[not found] ` <Pine.LNX.4.64.0708040032570.6905@asgard.lang.hm>
@ 2007-08-04 16:01 ` Ray Lee
2007-08-04 17:15 ` david
2007-08-09 5:11 ` david
0 siblings, 2 replies; 3+ messages in thread
From: Ray Lee @ 2007-08-04 16:01 UTC (permalink / raw)
To: david@lang.hm
Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, linux-mm,
linux-kernel, miklos, akpm, neilb, dgc, tomoki.sekiyama.qu,
nikita, trond.myklebust, yingchao.zhou, richard, netdev
(adding netdev cc:)
On 8/4/07, david@lang.hm <david@lang.hm> wrote:
> On Sat, 4 Aug 2007, Ingo Molnar wrote:
>
> > * Ingo Molnar <mingo@elte.hu> wrote:
> >
> >> There are positive reports in the never-ending "my system crawls like
> >> an XT when copying large files" bugzilla entry:
> >>
> >> http://bugzilla.kernel.org/show_bug.cgi?id=7372
> >
> > i forgot this entry:
> >
> > " We recently upgraded our office to gigabit Ethernet and got some big
> > AMD64 / 3ware boxes for file and vmware servers... only to find them
> > almost useless under any kind of real load. I've built some patched
> > 2.6.21.6 kernels (using the bdi throttling patch you mentioned) to
> > see if our various Debian Etch boxes run better. So far my testing
> > shows a *great* improvement over the stock Debian 2.6.18 kernel on
> > our configurations. "
> >
> > and bdi has been in -mm in the past i think, so we also know (to a
> > certain degree) that it does not hurt those workloads that are fine
> > either.
> >
> > [ my personal interest in this is the following regression: every time i
> > start a large kernel build with DEBUG_INFO on a quad-core 4GB RAM box,
> > i get up to 30 seconds complete pauses in Vim (and most other tasks),
> > during plain editing of the source code. (which happens when Vim tries
> > to write() to its swap/undo-file.) ]
>
> I have an issue that sounds like it's related.
>
> I've got a syslog server that's got two Opteron 246 cpu's, 16G ram, 2x140G
> 15k rpm drives (fusion MPT hardware mirroring), 16x500G 7200rpm SATA
> drives on 3ware 9500 cards (software raid6) running 2.6.20.3 with hz set
> at default and preempt turned off.
>
> I have syslog doing buffered writes to the SCSI drives and every 5 min a
> cron job copies the data to the raid array.
>
> I've found that if I do anything significant on the large raid array that
> the system looses a significant amount of the UDP syslog traffic, even
> though there should be pleanty of ram and cpu (and the spindles involved
> in the writes are not being touched), even a grep can cause up to 40%
> losses in the syslog traffic. I've experimented with nice levels (nicing
> down the grep and nicing up the syslogd) without a noticable effect on the
> losses.
>
> I've been planning to try a new kernel with hz=1000 to see if that would
> help, and after that experiment with the various preempt settings, but it
> sounds like the per-device queues may actually be more relavent to the
> problem.
>
> what would you suggest I test, and in what order and combination?
At least on a surface level, your report has some similarities to
http://lkml.org/lkml/2007/5/21/84 . In that message, John Miller
mentions several things he tried without effect:
< - I increased the max allowed receive buffer through
< proc/sys/net/core/rmem_max and the application calls the right
< syscall. "netstat -su" does not show any "packet receive errors".
<
< - After getting "kernel: swapper: page allocation failure.
< order:0, mode:0x20", I increased /proc/sys/vm/min_free_kbytes
<
< - ixgb.txt in kernel network documentation suggests to increase
< net.core.netdev_max_backlog to 300000. This did not help.
<
< - I also had to increase net.core.optmem_max, because the default
< value was too small for 700 multicast groups.
As they're all pretty simple to test, it may be worthwhile to give
them a shot just to rule things out.
Ray
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 00/23] per device dirty throttling -v8
2007-08-04 16:01 ` [PATCH 00/23] per device dirty throttling -v8 Ray Lee
@ 2007-08-04 17:15 ` david
2007-08-09 5:11 ` david
1 sibling, 0 replies; 3+ messages in thread
From: david @ 2007-08-04 17:15 UTC (permalink / raw)
To: Ray Lee
Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, linux-mm,
linux-kernel, miklos, akpm, neilb, dgc, tomoki.sekiyama.qu,
nikita, trond.myklebust, yingchao.zhou, richard, netdev
On Sat, 4 Aug 2007, Ray Lee wrote:
> (adding netdev cc:)
>
> On 8/4/07, david@lang.hm <david@lang.hm> wrote:
>> On Sat, 4 Aug 2007, Ingo Molnar wrote:
>>
>>> * Ingo Molnar <mingo@elte.hu> wrote:
>>>
>>>> There are positive reports in the never-ending "my system crawls like
>>>> an XT when copying large files" bugzilla entry:
>>>>
>>>> http://bugzilla.kernel.org/show_bug.cgi?id=7372
>>>
>>> i forgot this entry:
>>>
>>> " We recently upgraded our office to gigabit Ethernet and got some big
>>> AMD64 / 3ware boxes for file and vmware servers... only to find them
>>> almost useless under any kind of real load. I've built some patched
>>> 2.6.21.6 kernels (using the bdi throttling patch you mentioned) to
>>> see if our various Debian Etch boxes run better. So far my testing
>>> shows a *great* improvement over the stock Debian 2.6.18 kernel on
>>> our configurations. "
>>>
>>> and bdi has been in -mm in the past i think, so we also know (to a
>>> certain degree) that it does not hurt those workloads that are fine
>>> either.
>>>
>>> [ my personal interest in this is the following regression: every time i
>>> start a large kernel build with DEBUG_INFO on a quad-core 4GB RAM box,
>>> i get up to 30 seconds complete pauses in Vim (and most other tasks),
>>> during plain editing of the source code. (which happens when Vim tries
>>> to write() to its swap/undo-file.) ]
>>
>> I have an issue that sounds like it's related.
>>
>> I've got a syslog server that's got two Opteron 246 cpu's, 16G ram, 2x140G
>> 15k rpm drives (fusion MPT hardware mirroring), 16x500G 7200rpm SATA
>> drives on 3ware 9500 cards (software raid6) running 2.6.20.3 with hz set
>> at default and preempt turned off.
>>
>> I have syslog doing buffered writes to the SCSI drives and every 5 min a
>> cron job copies the data to the raid array.
>>
>> I've found that if I do anything significant on the large raid array that
>> the system looses a significant amount of the UDP syslog traffic, even
>> though there should be pleanty of ram and cpu (and the spindles involved
>> in the writes are not being touched), even a grep can cause up to 40%
>> losses in the syslog traffic. I've experimented with nice levels (nicing
>> down the grep and nicing up the syslogd) without a noticable effect on the
>> losses.
>>
>> I've been planning to try a new kernel with hz=1000 to see if that would
>> help, and after that experiment with the various preempt settings, but it
>> sounds like the per-device queues may actually be more relavent to the
>> problem.
>>
>> what would you suggest I test, and in what order and combination?
>
> At least on a surface level, your report has some similarities to
> http://lkml.org/lkml/2007/5/21/84 . In that message, John Miller
> mentions several things he tried without effect:
>
> < - I increased the max allowed receive buffer through
> < proc/sys/net/core/rmem_max and the application calls the right
> < syscall. "netstat -su" does not show any "packet receive errors".
> <
> < - After getting "kernel: swapper: page allocation failure.
> < order:0, mode:0x20", I increased /proc/sys/vm/min_free_kbytes
> <
> < - ixgb.txt in kernel network documentation suggests to increase
> < net.core.netdev_max_backlog to 300000. This did not help.
> <
> < - I also had to increase net.core.optmem_max, because the default
> < value was too small for 700 multicast groups.
>
> As they're all pretty simple to test, it may be worthwhile to give
> them a shot just to rule things out.
I will try them later today.
I forgot to mention that the filesystems are ext2 for the mirrored high
speed disks and xfs for the 8TB array.
David Lang
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 00/23] per device dirty throttling -v8
2007-08-04 16:01 ` [PATCH 00/23] per device dirty throttling -v8 Ray Lee
2007-08-04 17:15 ` david
@ 2007-08-09 5:11 ` david
1 sibling, 0 replies; 3+ messages in thread
From: david @ 2007-08-09 5:11 UTC (permalink / raw)
To: Ray Lee
Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, linux-mm,
linux-kernel, miklos, akpm, neilb, dgc, tomoki.sekiyama.qu,
nikita, trond.myklebust, yingchao.zhou, richard, netdev
On Sat, 4 Aug 2007, Ray Lee wrote:
> On 8/4/07, david@lang.hm <david@lang.hm> wrote:
>> On Sat, 4 Aug 2007, Ingo Molnar wrote:
>>
> At least on a surface level, your report has some similarities to
> http://lkml.org/lkml/2007/5/21/84 . In that message, John Miller
> mentions several things he tried without effect:
>
> < - I increased the max allowed receive buffer through
> < proc/sys/net/core/rmem_max and the application calls the right
> < syscall. "netstat -su" does not show any "packet receive errors".
mercury1:/proc/sys/net/core# cat rmem_*
124928
131071
mercury1:/proc/sys/net/core# netstat -su
Udp:
697853177 packets received
10025642 packets to unknown port received.
191726680 packet receive errors
63194 packets sent
RcvbufErrors: 191726680
UdpLite:
mercury1:/proc/sys/net/core# echo "512000" >rmem_max
> < - After getting "kernel: swapper: page allocation failure.
> < order:0, mode:0x20", I increased /proc/sys/vm/min_free_kbytes
I have not seen any similar errors
> < - ixgb.txt in kernel network documentation suggests to increase
> < net.core.netdev_max_backlog to 300000. This did not help.
mercury1:/proc/sys/net/core# cat netdev_*
300
1000
mercury1:/proc/sys/net/core# echo "300000" >netdev_max_backlog
> < - I also had to increase net.core.optmem_max, because the default
> < value was too small for 700 multicast groups.
I'm not running multicast.
> As they're all pretty simple to test, it may be worthwhile to give
> them a shot just to rule things out.
unfortunantly the load is not high enough right now to see a real
difference (it's only doing ~1400 logs/sec) I'll catch it at a higher load
point to see if these make any difference.
David Lang
> Ray
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-08-09 5:15 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20070803123712.987126000@chello.nl>
[not found] ` <alpine.LFD.0.999.0708031518440.8184@woody.linux-foundation.org>
[not found] ` <20070804063217.GA25069@elte.hu>
[not found] ` <20070804070737.GA940@elte.hu>
[not found] ` <Pine.LNX.4.64.0708040032570.6905@asgard.lang.hm>
2007-08-04 16:01 ` [PATCH 00/23] per device dirty throttling -v8 Ray Lee
2007-08-04 17:15 ` david
2007-08-09 5:11 ` david
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).