* Re: Runnable threads on run queue
2006-07-08 20:18 Runnable threads on run queue Ask List
@ 2006-07-08 21:18 ` Chase Venters
2006-07-08 22:54 ` Ask List
2006-07-08 22:19 ` Dr. David Alan Gilbert
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Chase Venters @ 2006-07-08 21:18 UTC (permalink / raw)
To: Ask List; +Cc: linux-kernel
On Saturday 08 July 2006 15:18, Ask List wrote:
> Have an issue maybe someone on this list can help with.
>
> At times of very high load the number of processes on the run queue drops
> to 0 then jumps really high and then drops to 0 and back and forth. It
> seems to last 10 seconds or so. If you look at this vmstat you can see an
> example of what I mean. Now im not a linux kernel expert but i am thinking
> it has something to do with the scheduling algorithm and locking of the run
> queue. For this particular application I need all available threads to be
> processed as fast as possible. Is there a way for me to elimnate this
> behavior or at least minimize the window in which there are no threads on
> the run queue? Is there a sysctl parameter I can use?
If there's a runnable task on the system, the run queue should never empty
except inside schedule(). The scheduler should then swap expired and active.
First question - what kernel are you running? Is it stock?
Second question - what's the application? Are you sure your threads just
aren't falling into interruptible sleep due to an app bug of some sort? Are
you observing misbehavior in the application (long pauses) or just in the
reporting?
Thanks,
Chase
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: Runnable threads on run queue
2006-07-08 21:18 ` Chase Venters
@ 2006-07-08 22:54 ` Ask List
0 siblings, 0 replies; 13+ messages in thread
From: Ask List @ 2006-07-08 22:54 UTC (permalink / raw)
To: linux-kernel
Chase Venters <chase.venters <at> clientec.com> writes:
>
> On Saturday 08 July 2006 15:18, Ask List wrote:
> > Have an issue maybe someone on this list can help with.
> >
> > At times of very high load the number of processes on the run queue drops
> > to 0 then jumps really high and then drops to 0 and back and forth. It
> > seems to last 10 seconds or so. If you look at this vmstat you can see an
> > example of what I mean. Now im not a linux kernel expert but i am thinking
> > it has something to do with the scheduling algorithm and locking of the run
> > queue. For this particular application I need all available threads to be
> > processed as fast as possible. Is there a way for me to elimnate this
> > behavior or at least minimize the window in which there are no threads on
> > the run queue? Is there a sysctl parameter I can use?
>
> If there's a runnable task on the system, the run queue should never empty
> except inside schedule(). The scheduler should then swap expired and active.
>
> First question - what kernel are you running? Is it stock?
>
> Second question - what's the application? Are you sure your threads just
> aren't falling into interruptible sleep due to an app bug of some sort? Are
> you observing misbehavior in the application (long pauses) or just in the
> reporting?
>
> Thanks,
> Chase
>
The kernel version is a debian kernel source version 2.4.27-3 and it was
recompiled to support SMP, High Memory, etc. The application is SpamAssassin
version 3.1.1. It is possible there may be an app bug, however I do not know
this for certain. We have manipulated the configuration of the daemon to try and
aleviate the symptoms to no avail. We experience the issues if we use a mysql
backend for the bayes db or not. We are experiencing misbehavior in the
application in the sense of the time it takes for messages to be processed. It
normally takes tenths of a second to process incoming mail, however we notice
the processing time jump to over 10 seconds each time the run queue drops to 0
and then drops back down to tenths of a second when the queue fills back up.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Runnable threads on run queue
2006-07-08 20:18 Runnable threads on run queue Ask List
2006-07-08 21:18 ` Chase Venters
@ 2006-07-08 22:19 ` Dr. David Alan Gilbert
2006-07-08 23:08 ` Ask List
2006-07-09 7:20 ` Mike Galbraith
2006-07-09 8:33 ` Rik van Riel
3 siblings, 1 reply; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2006-07-08 22:19 UTC (permalink / raw)
To: Ask List; +Cc: linux-kernel
* Ask List (askthelist@gmail.com) wrote:
> Have an issue maybe someone on this list can help with.
<snip>
> Please help.
>
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 83 0 1328 301684 37868 1520632 0 0 0 264 400 1332 98 2 0 0
> 17 0 1328 293936 37868 1520688 0 0 0 0 537 979 97 3 0 0
> 73 0 1328 293688 37868 1520712 0 0 0 0 268 2643 98 2 0 0
> 80 0 1328 277220 37868 1520756 0 0 0 0 351 824 98 2 0 0
> 49 0 1328 262452 37868 1520800 0 0 0 0 393 1882 97 3 0 0
> 45 0 1328 246796 37868 1520828 0 0 0 304 302 1631 96 4 0 0
> 55 0 1328 243852 37868 1520872 0 0 0 0 356 1101 99 1 0 0
> 17 0 1328 228672 37868 1520916 0 0 0 0 336 748 97 3 0 0
> 0 0 1328 299948 37868 1520956 0 0 0 0 299 821 78 3 19 0
> 0 0 1328 299184 37868 1520960 0 0 0 0 168 78 8 0 92 0
Could you also post the output of iostat -x 1 covering the same period?
(You might need to restrict the set of devices if you have a lot)
The pattern of bursts of output is something I've seen on apps
just trying to do continuous large writes and I'm wondering
what you are seeing there.
Dave
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Runnable threads on run queue
2006-07-08 22:19 ` Dr. David Alan Gilbert
@ 2006-07-08 23:08 ` Ask List
0 siblings, 0 replies; 13+ messages in thread
From: Ask List @ 2006-07-08 23:08 UTC (permalink / raw)
To: linux-kernel
I dont exactly have iostat -x 1 from the same time frame. But I do have the
collected sar data from before and during our period of high load. Here is
a snippet before ...
tps rtps wtps bread/s bwrtn/s
11:46:00 PM 5.00 0.00 5.00 0.00 160.00
11:46:01 PM 8.91 0.00 8.91 0.00 182.18
11:46:02 PM 10.00 0.00 10.00 0.00 368.00
11:46:03 PM 7.92 0.00 7.92 0.00 142.57
11:46:04 PM 13.00 0.00 13.00 0.00 336.00
11:46:05 PM 9.90 0.00 9.90 0.00 261.39
11:46:06 PM 9.00 0.00 9.00 0.00 264.00
11:46:07 PM 6.93 0.00 6.93 0.00 198.02
11:46:08 PM 8.00 0.00 8.00 0.00 288.00
11:46:09 PM 12.87 0.00 12.87 0.00 324.75
11:46:10 PM 9.00 0.00 9.00 0.00 280.00
11:46:11 PM 6.93 0.00 6.93 0.00 134.65
11:46:12 PM 12.00 0.00 12.00 0.00 336.00
11:46:13 PM 10.89 0.00 10.89 0.00 253.47
11:46:14 PM 18.00 0.00 18.00 0.00 464.00
11:46:15 PM 4.81 0.00 4.81 0.00 84.62
11:46:16 PM 10.00 0.00 10.00 0.00 328.00
11:46:17 PM 10.89 0.00 10.89 0.00 269.31
11:46:18 PM 11.00 0.00 11.00 0.00 304.00
11:46:19 PM 30.69 0.00 30.69 0.00 451.49
11:46:20 PM 9.00 0.00 9.00 0.00 272.00
11:46:21 PM 5.94 0.00 5.94 0.00 95.05
11:46:22 PM 10.00 0.00 10.00 0.00 304.00
11:46:23 PM 5.94 0.00 5.94 0.00 150.50
11:46:24 PM 17.00 0.00 17.00 0.00 432.00
11:46:25 PM 6.93 0.00 6.93 0.00 190.10
11:46:26 PM 10.00 0.00 10.00 0.00 344.00
11:46:27 PM 8.91 0.00 8.91 0.00 166.34
11:46:28 PM 7.00 0.00 7.00 0.00 192.00
11:46:29 PM 15.84 0.00 15.84 0.00 427.72
11:46:30 PM 7.00 0.00 7.00 0.00 168.00
11:46:31 PM 9.90 0.00 9.90 0.00 221.78
11:46:32 PM 12.00 0.00 12.00 0.00 360.00
11:46:33 PM 10.89 0.00 10.89 0.00 245.54
11:46:34 PM 10.00 0.00 10.00 0.00 280.00
11:46:35 PM 6.93 0.00 6.93 0.00 134.65
11:46:36 PM 11.00 0.00 11.00 0.00 296.00
11:46:37 PM 8.91 0.00 8.91 0.00 205.94
11:46:38 PM 12.00 0.00 12.00 0.00 376.00
11:46:39 PM 14.85 0.00 14.85 0.00 435.64
11:46:40 PM 9.00 0.00 9.00 0.00 248.00
11:46:41 PM 7.92 0.00 7.92 0.00 237.62
11:46:42 PM 10.00 0.00 10.00 0.00 320.00
11:46:43 PM 5.94 0.00 5.94 0.00 55.45
11:46:44 PM 15.00 0.00 15.00 0.00 408.00
11:46:45 PM 9.90 0.00 9.90 0.00 229.70
11:46:46 PM 10.00 0.00 10.00 0.00 272.00
11:46:47 PM 10.89 0.00 10.89 0.00 269.31
11:46:48 PM 10.00 0.00 10.00 0.00 272.00
11:46:49 PM 36.63 0.00 36.63 0.00 514.85
11:46:50 PM 11.00 0.00 11.00 0.00 296.00
11:46:51 PM 8.91 0.00 8.91 0.00 205.94
11:46:52 PM 11.00 0.00 11.00 0.00 312.00
11:46:53 PM 8.91 0.00 8.91 0.00 190.10
11:46:54 PM 15.00 0.00 15.00 0.00 368.00
11:46:55 PM 9.90 0.00 9.90 0.00 253.47
11:46:56 PM 11.00 0.00 11.00 0.00 352.00
11:46:57 PM 8.91 0.00 8.91 0.00 245.54
11:46:58 PM 9.00 0.00 9.00 0.00 256.00
11:46:59 PM 11.88 0.00 11.88 0.00 308.91
11:47:00 PM 7.00 0.00 7.00 0.00 168.00
and here is a snippet during high load....
12:13:00 AM 6.00 0.00 6.00 0.00 224.00
12:13:01 AM 8.06 0.00 8.06 0.00 180.65
12:13:02 AM 18.00 0.00 18.00 0.00 544.00
12:13:03 AM 8.00 0.00 8.00 0.00 192.00
12:13:04 AM 47.00 0.00 47.00 0.00 856.00
12:13:05 AM 8.91 0.00 8.91 0.00 229.70
12:13:06 AM 15.00 0.00 15.00 0.00 392.00
12:13:07 AM 9.90 0.00 9.90 0.00 229.70
12:13:08 AM 8.00 0.00 8.00 0.00 232.00
12:13:09 AM 15.52 0.00 15.52 0.00 379.31
12:13:10 AM 6.98 0.00 6.98 0.00 198.45
12:13:12 AM 12.96 0.00 12.96 0.00 348.15
12:13:13 AM 17.00 0.00 17.00 0.00 424.00
12:13:14 AM 28.74 0.00 28.74 0.00 526.95
12:13:15 AM 13.46 0.00 13.46 0.00 361.54
12:13:16 AM 9.40 0.00 9.40 0.00 225.64
12:13:17 AM 15.00 0.00 15.00 0.00 488.00
12:13:19 AM 14.91 0.00 14.91 0.00 377.64
12:13:20 AM 9.00 0.00 9.00 0.00 296.00
12:13:21 AM 12.15 0.00 12.15 0.00 366.36
12:13:22 AM 26.00 0.00 26.00 0.00 784.00
12:13:24 AM 11.06 0.00 11.06 0.00 324.42
12:13:25 AM 14.81 0.00 14.81 0.00 333.33
12:13:26 AM 25.47 0.00 25.47 0.00 777.36
12:13:27 AM 19.00 0.00 19.00 0.00 480.00
12:13:28 AM 20.79 0.00 20.79 0.00 538.61
12:13:29 AM 5.00 0.00 5.00 0.00 136.00
12:13:31 AM 12.73 0.00 12.73 0.00 298.18
12:13:32 AM 23.00 0.00 23.00 0.00 632.00
12:13:33 AM 36.79 0.00 36.79 0.00 1011.32
12:13:34 AM 37.50 0.00 37.50 0.00 950.00
12:13:35 AM 7.76 0.00 7.76 0.00 186.21
12:13:36 AM 12.93 0.00 12.93 0.00 324.14
12:13:38 AM 8.57 0.00 8.57 0.00 210.29
12:13:39 AM 30.00 0.00 30.00 0.00 696.00
12:13:40 AM 9.90 0.00 9.90 0.00 245.54
12:13:41 AM 12.00 0.00 12.00 0.00 328.00
12:13:42 AM 5.94 0.00 5.94 0.00 63.37
12:13:43 AM 7.00 0.00 7.00 0.00 256.00
12:13:44 AM 44.54 0.00 44.54 0.00 746.22
12:13:45 AM 9.71 0.00 9.71 0.00 248.54
12:13:46 AM 13.89 0.00 13.89 0.00 370.37
12:13:47 AM 13.00 0.00 13.00 0.00 336.00
12:13:48 AM 13.86 0.00 13.86 0.00 324.75
12:13:49 AM 15.00 0.00 15.00 0.00 344.00
12:13:50 AM 3.96 0.00 3.96 0.00 39.60
12:13:51 AM 11.00 0.00 11.00 0.00 368.00
12:13:52 AM 7.92 0.00 7.92 0.00 174.26
12:13:54 AM 10.17 0.00 10.17 0.00 266.67
12:13:55 AM 7.41 0.00 7.41 0.00 133.33
12:13:56 AM 15.00 0.00 15.00 0.00 328.00
12:13:57 AM 5.71 0.00 5.71 0.00 91.43
12:13:58 AM 9.68 0.00 9.68 0.00 316.13
12:13:59 AM 24.27 0.00 24.27 0.00 520.39
12:14:00 AM 10.89 0.00 10.89 0.00 324.75
... I hope this helps.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Runnable threads on run queue
2006-07-08 20:18 Runnable threads on run queue Ask List
2006-07-08 21:18 ` Chase Venters
2006-07-08 22:19 ` Dr. David Alan Gilbert
@ 2006-07-09 7:20 ` Mike Galbraith
2006-07-09 23:38 ` Horst von Brand
2006-07-12 4:14 ` Ask List
2006-07-09 8:33 ` Rik van Riel
3 siblings, 2 replies; 13+ messages in thread
From: Mike Galbraith @ 2006-07-09 7:20 UTC (permalink / raw)
To: Ask List; +Cc: linux-kernel
On Sat, 2006-07-08 at 20:18 +0000, Ask List wrote:
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 83 0 1328 301684 37868 1520632 0 0 0 264 400 1332 98 2 0 0
> 17 0 1328 293936 37868 1520688 0 0 0 0 537 979 97 3 0 0
> 73 0 1328 293688 37868 1520712 0 0 0 0 268 2643 98 2 0 0
> 80 0 1328 277220 37868 1520756 0 0 0 0 351 824 98 2 0 0
> 49 0 1328 262452 37868 1520800 0 0 0 0 393 1882 97 3 0 0
> 45 0 1328 246796 37868 1520828 0 0 0 304 302 1631 96 4 0 0
> 55 0 1328 243852 37868 1520872 0 0 0 0 356 1101 99 1 0 0
> 17 0 1328 228672 37868 1520916 0 0 0 0 336 748 97 3 0 0
> 0 0 1328 299948 37868 1520956 0 0 0 0 299 821 78 3 19 0
> 0 0 1328 299184 37868 1520960 0 0 0 0 168 78 8 0 92 0
> 0 0 1328 299184 37868 1520960 0 0 0 248 173 38 0 1 99 0
> 0 0 1328 299184 37868 1520960 0 0 0 0 160 20 0 0 100 0
> 0 0 1328 299184 37868 1520960 0 0 0 0 151 6 0 0 100 0
> 0 0 1328 299184 37868 1520960 0 0 0 0 162 42 0 1 99 0
> 1 0 1328 299188 37868 1520960 0 0 0 0 161 24 0 0 100 0
> 0 0 1328 298808 37868 1520988 0 0 0 100 303 1119 57 0 42 0
> 0 0 1328 298808 37868 1520988 0 0 0 0 162 22 0 1 99 0
Looking at the interrupts column, I suspect you have a network problem,
not a scheduler problem. Looks to me like your SpamAssasins are simply
running out of work to do because your network traffic comes in bursts.
-Mike
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Runnable threads on run queue
2006-07-09 7:20 ` Mike Galbraith
@ 2006-07-09 23:38 ` Horst von Brand
2006-07-12 4:14 ` Ask List
1 sibling, 0 replies; 13+ messages in thread
From: Horst von Brand @ 2006-07-09 23:38 UTC (permalink / raw)
To: Mike Galbraith; +Cc: Ask List, linux-kernel
Mike Galbraith <efault@gmx.de> wrote:
> On Sat, 2006-07-08 at 20:18 +0000, Ask List wrote:
> > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> > r b swpd free buff cache si so bi bo in cs us sy id wa
[...]
> Looking at the interrupts column, I suspect you have a network problem,
> not a scheduler problem. Looks to me like your SpamAssasins are simply
> running out of work to do because your network traffic comes in bursts.
spamassassin acted up here some time ago. With personal training and some
messages it went to a loop and the load went through the roof. Couldn't
find a cure, plus some hundred users with large personalized rule files
were causing problems anyway, so we axed that.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Runnable threads on run queue
2006-07-09 7:20 ` Mike Galbraith
2006-07-09 23:38 ` Horst von Brand
@ 2006-07-12 4:14 ` Ask List
2006-07-12 5:40 ` Mike Galbraith
1 sibling, 1 reply; 13+ messages in thread
From: Ask List @ 2006-07-12 4:14 UTC (permalink / raw)
To: linux-kernel
Mike Galbraith <efault <at> gmx.de> writes:
...
> Looking at the interrupts column, I suspect you have a network problem,
> not a scheduler problem. Looks to me like your SpamAssasins are simply
> running out of work to do because your network traffic comes in bursts.
>
> -Mike
>
>
Network Problem? So your saying our mail servers are not sending spam traffic
fast enough if spam assassin processes are running out of work to do? So when
our mail servers are not sending spam traffic we see our cpu,cs,interrupts, &
runnable threads drop ...?
I'd really like to believe this is true, however in the sa logs there are still
plenty of B (busy threads)...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Runnable threads on run queue
2006-07-12 4:14 ` Ask List
@ 2006-07-12 5:40 ` Mike Galbraith
2006-07-13 19:05 ` Ask List
0 siblings, 1 reply; 13+ messages in thread
From: Mike Galbraith @ 2006-07-12 5:40 UTC (permalink / raw)
To: Ask List; +Cc: linux-kernel
On Wed, 2006-07-12 at 04:14 +0000, Ask List wrote:
> Network Problem? So your saying our mail servers are not sending spam traffic
> fast enough if spam assassin processes are running out of work to do? So when
> our mail servers are not sending spam traffic we see our cpu,cs,interrupts, &
> runnable threads drop ...?
More or less, yes. I think somebody is dropping the communication ball.
-Mike
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Runnable threads on run queue
2006-07-08 20:18 Runnable threads on run queue Ask List
` (2 preceding siblings ...)
2006-07-09 7:20 ` Mike Galbraith
@ 2006-07-09 8:33 ` Rik van Riel
2006-07-12 3:55 ` Ask List
3 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2006-07-09 8:33 UTC (permalink / raw)
To: Ask List; +Cc: linux-kernel
Ask List wrote:
> Have an issue maybe someone on this list can help with.
>
> At times of very high load the number of processes on the run queue drops to
> 0 then jumps really high and then drops to 0 and back and forth. It seems to
> last 10 seconds or so.
Are you using sendmail by any chance? :)
We start out with a low load averag, so sendmail forks as many
spamassassins as it can...
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 83 0 1328 301684 37868 1520632 0 0 0 264 400 1332 98 2 0 0
> 17 0 1328 293936 37868 1520688 0 0 0 0 537 979 97 3 0 0
> 73 0 1328 293688 37868 1520712 0 0 0 0 268 2643 98 2 0 0
> 80 0 1328 277220 37868 1520756 0 0 0 0 351 824 98 2 0 0
> 49 0 1328 262452 37868 1520800 0 0 0 0 393 1882 97 3 0 0
> 45 0 1328 246796 37868 1520828 0 0 0 304 302 1631 96 4 0 0
> 55 0 1328 243852 37868 1520872 0 0 0 0 356 1101 99 1 0 0
> 17 0 1328 228672 37868 1520916 0 0 0 0 336 748 97 3 0 0
> 0 0 1328 299948 37868 1520956 0 0 0 0 299 821 78 3 19 0
> 0 0 1328 299184 37868 1520960 0 0 0 0 168 78 8 0 92 0
... and guess what?
The load average went through the roof, so sendmail stops forking
spamassassins. Now nothing is running, and sendmail will not start
forking new spamassassins again until after the load average has
decayed to an acceptable level.
After that, it will fork way too many at once again, and the load
average will go through the roof. Lather, rinse, repeat.
You'd probably be better off limiting the number of simultaneous
local mail deliveries to something reasonable, so the load average
always stays at an acceptable level - and more importantly, all of
the CPU capacity could be used if needed...
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
^ permalink raw reply [flat|nested] 13+ messages in thread