Runnable threads on run queue

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Runnable threads on run queue
@ 2006-07-08 20:18 Ask List
  2006-07-08 21:18 ` Chase Venters
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Ask List @ 2006-07-08 20:18 UTC (permalink / raw)
  To: linux-kernel

Have an issue maybe someone on this list can help with. 

At times of very high load the number of processes on the run queue drops to
 0 then jumps really high and then drops to 0 and back and forth. It seems to
last 10 seconds or so. If you look at this vmstat you can see an example of 
what I mean. Now im not a linux kernel expert but i am thinking it has 
something to do with the scheduling algorithm and locking of the run queue. 
For this particular application I need all available threads to be processed as
fast as possible. Is there a way for me to elimnate this behavior or at least
minimize the window in which there are no threads on the run queue? Is there a
sysctl parameter I can use?

Please help.

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
83  0   1328 301684  37868 1520632    0    0     0   264  400  1332 98  2  0  0
17  0   1328 293936  37868 1520688    0    0     0     0  537   979 97  3  0  0
73  0   1328 293688  37868 1520712    0    0     0     0  268  2643 98  2  0  0
80  0   1328 277220  37868 1520756    0    0     0     0  351   824 98  2  0  0
49  0   1328 262452  37868 1520800    0    0     0     0  393  1882 97  3  0  0
45  0   1328 246796  37868 1520828    0    0     0   304  302  1631 96  4  0  0
55  0   1328 243852  37868 1520872    0    0     0     0  356  1101 99  1  0  0
17  0   1328 228672  37868 1520916    0    0     0     0  336   748 97  3  0  0
 0  0   1328 299948  37868 1520956    0    0     0     0  299   821 78  3 19  0
 0  0   1328 299184  37868 1520960    0    0     0     0  168    78  8  0 92  0
 0  0   1328 299184  37868 1520960    0    0     0   248  173    38  0  1 99  0
 0  0   1328 299184  37868 1520960    0    0     0     0  160    20  0  0 100  0
 0  0   1328 299184  37868 1520960    0    0     0     0  151     6  0  0 100  0
 0  0   1328 299184  37868 1520960    0    0     0     0  162    42  0  1 99  0
 1  0   1328 299188  37868 1520960    0    0     0     0  161    24  0  0 100  0
 0  0   1328 298808  37868 1520988    0    0     0   100  303  1119 57  0 42  0
 0  0   1328 298808  37868 1520988    0    0     0     0  162    22  0  1 99  0
 3  0   1328 298808  37868 1520992    0    0     0     0  195   233 16  0 84  0
14  0   1328 298788  37868 1521032    0    0     0     0  400  1158 87  3 10  0
54  0   1328 298860  37868 1521064    0    0     0     0  438   940 97  3  0  0
80  0   1328 298296  37868 1521092    0    0     0   180  476   556 97  3  0  0
29  0   1328 294632  37868 1521148    0    0     0     0  824  1178 99  1  0  0
68  0   1328 292936  37868 1521172    0    0     0     0  404  2283 96  4  0  0
73  0   1328 292740  37868 1521216    0    0     0     0  521   828 98  2  0  0
38  0   1328 260340  37868 1521260    0    0     0     0  405  1069 96  4  0  0
46  0   1328 253072  37868 1521292    0    0     0   300  371  1692 95  5  0  0
71  0   1328 244084  37868 1521328    0    0     0     0  357  1478 98  2  0  0
71  0   1328 233916  37868 1521384    0    0     0     0  528  1121 97  3  0  0
32  0   1328 222784  37868 1521416    0    0     0     0  347  1191 96  4  0  0
76  0   1328 212396  37868 1521448    0    0     0     0  337  2526 97  3  0  0
71  0   1328 198684  37868 1521488    0    0     0   284  497   942 98  2  0  0
40  0   1328 189964  37868 1521532    0    0     0     0  420  1525 96  4  0  0
53  0   1328 179656  37868 1521576    0    0     0     0  391  1983 98  2  0  0
91  0   1328 169164  37868 1521608    0    0     0     0  415  2018 98  2  0  0
70  0   1328 151300  37868 1521648    0    0     0     0  411  1769 98  2  0  0
43  0   1328 145980  37868 1521684    0    0     0   308  420  1713 96  4  0  0
48  0   1328 142708  37868 1521724    0    0     0     0  290  1490 97  3  0  0
76  0   1328 126080  37868 1521752    0    0     0     0  389  1568 97  3  0  0
85  0   1328 120544  37864 1518164    0    0     0     0  365  1261 96  4  0  0
51  0   1328 121312  37864 1506908    0    0     0     0  306  1217 98  2  0  0
55  0   1328 121488  37864 1495128    0    0     0   292  364  1976 98  2  0  0
79  0   1328 120408  37864 1486072    0    0     0     0  328  2106 97  3  0  0
29  0   1328 216660  37864 1482744    0    0     0     0  387   866 97  3  0  0
 0  0   1328 321932  37864 1482788    0    0     0     0  289   750 67  3 31  0
 0  0   1328 321932  37864 1482788    0    0     0     0  158    10  0  0 100  0
 2  0   1328 321912  37864 1482792    0    0     0   268  201   156  4  1 94  0
 0  0   1328 321892  37864 1482796    0    0     0     0  180   270  7  0 93  0
 0  0   1328 321892  37864 1482796    0    0     0     0  152     4  0  0 100  0
 0  0   1328 321880  37864 1482796    0    0     0     0  158    26  0  1 99  0
 0  0   1328 321844  37864 1482820    0    0     0     0  330   454 41  1 58  0
 0  0   1328 321844  37864 1482820    0    0     0   120  167    30  0  0 100  0
 0  0   1328 321844  37864 1482820    0    0     0     0  166    35  1  0 99  0
35  0   1328 321476  37864 1482836    0    0     0     0  530  1026 67  2 31  0
76  0   1328 321528  37868 1482864    0    0     0     0  406  1744 96  4  0  0
41  0   1328 321172  37868 1482920    0    0     0   192  409   690 97  3  0  0
34  0   1328 314788  37868 1482956    0    0     0     0  356  1616 97  3  0  0
63  0   1328 314368  37868 1482996    0    0     0     0  437  1277 98  2  0  0
 1  0   1328 331744  37868 1483044    0    0     0     0  331   709 90  3  7  0
 0  0   1328 331724  37868 1483048    0    0     0     0  174   395  4  0 96  0
 0  0   1328 331724  37868 1483048    0    0     0   224  168    16  0  0 100  0
 0  0   1328 331724  37868 1483048    0    0     0     0  167    54  0  1 99  0
 7  0   1328 331744  37868 1483048    0    0     0     0  238   167 10  0 90  0
46  0   1328 330788  37868 1483076    0    0     0     0  878  1677 98  2  0  0
84  0   1328 330444  37868 1483100    0    0     0     0  425  1449 97  3  0  0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Runnable threads on run queue
  2006-07-08 20:18 Ask List
@ 2006-07-08 21:18 ` Chase Venters
  2006-07-08 22:54   ` Ask List
  2006-07-08 22:19 ` Dr. David Alan Gilbert
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Chase Venters @ 2006-07-08 21:18 UTC (permalink / raw)
  To: Ask List; +Cc: linux-kernel

On Saturday 08 July 2006 15:18, Ask List wrote:
> Have an issue maybe someone on this list can help with.
>
> At times of very high load the number of processes on the run queue drops
> to 0 then jumps really high and then drops to 0 and back and forth. It
> seems to last 10 seconds or so. If you look at this vmstat you can see an
> example of what I mean. Now im not a linux kernel expert but i am thinking
> it has something to do with the scheduling algorithm and locking of the run
> queue. For this particular application I need all available threads to be
> processed as fast as possible. Is there a way for me to elimnate this
> behavior or at least minimize the window in which there are no threads on
> the run queue? Is there a sysctl parameter I can use?

If there's a runnable task on the system, the run queue should never empty 
except inside schedule(). The scheduler should then swap expired and active.

First question - what kernel are you running? Is it stock?

Second question - what's the application? Are you sure your threads just 
aren't falling into interruptible sleep due to an app bug of some sort? Are 
you observing misbehavior in the application (long pauses) or just in the 
reporting?

Thanks,
Chase

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Runnable threads on run queue
  2006-07-08 20:18 Ask List
  2006-07-08 21:18 ` Chase Venters
@ 2006-07-08 22:19 ` Dr. David Alan Gilbert
  2006-07-08 23:08   ` Ask List
  2006-07-09  7:20 ` Mike Galbraith
  2006-07-09  8:33 ` Rik van Riel
  3 siblings, 1 reply; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2006-07-08 22:19 UTC (permalink / raw)
  To: Ask List; +Cc: linux-kernel

* Ask List (askthelist@gmail.com) wrote:
> Have an issue maybe someone on this list can help with. 

<snip>

> Please help.
> 
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
> 83  0   1328 301684  37868 1520632    0    0     0   264  400  1332 98  2  0  0
> 17  0   1328 293936  37868 1520688    0    0     0     0  537   979 97  3  0  0
> 73  0   1328 293688  37868 1520712    0    0     0     0  268  2643 98  2  0  0
> 80  0   1328 277220  37868 1520756    0    0     0     0  351   824 98  2  0  0
> 49  0   1328 262452  37868 1520800    0    0     0     0  393  1882 97  3  0  0
> 45  0   1328 246796  37868 1520828    0    0     0   304  302  1631 96  4  0  0
> 55  0   1328 243852  37868 1520872    0    0     0     0  356  1101 99  1  0  0
> 17  0   1328 228672  37868 1520916    0    0     0     0  336   748 97  3  0  0
>  0  0   1328 299948  37868 1520956    0    0     0     0  299   821 78  3 19  0
>  0  0   1328 299184  37868 1520960    0    0     0     0  168    78  8  0 92  0

Could you also post the output of iostat -x 1  covering the same period?
(You might need to restrict the set of devices if you have a lot)
The pattern of bursts of output is something I've seen on apps
just trying to do continuous large writes and I'm wondering
what you are seeing there.

Dave
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    | Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Runnable threads on run queue
  2006-07-08 21:18 ` Chase Venters
@ 2006-07-08 22:54   ` Ask List
  0 siblings, 0 replies; 13+ messages in thread
From: Ask List @ 2006-07-08 22:54 UTC (permalink / raw)
  To: linux-kernel

Chase Venters <chase.venters <at> clientec.com> writes:

> 
> On Saturday 08 July 2006 15:18, Ask List wrote:
> > Have an issue maybe someone on this list can help with.
> >
> > At times of very high load the number of processes on the run queue drops
> > to 0 then jumps really high and then drops to 0 and back and forth. It
> > seems to last 10 seconds or so. If you look at this vmstat you can see an
> > example of what I mean. Now im not a linux kernel expert but i am thinking
> > it has something to do with the scheduling algorithm and locking of the run
> > queue. For this particular application I need all available threads to be
> > processed as fast as possible. Is there a way for me to elimnate this
> > behavior or at least minimize the window in which there are no threads on
> > the run queue? Is there a sysctl parameter I can use?
> 
> If there's a runnable task on the system, the run queue should never empty 
> except inside schedule(). The scheduler should then swap expired and active.
> 
> First question - what kernel are you running? Is it stock?
> 
> Second question - what's the application? Are you sure your threads just 
> aren't falling into interruptible sleep due to an app bug of some sort? Are 
> you observing misbehavior in the application (long pauses) or just in the 
> reporting?
> 
> Thanks,
> Chase
> 

The kernel version is a debian kernel source version 2.4.27-3 and it was
recompiled to support SMP, High Memory, etc. The application is SpamAssassin
version 3.1.1. It is possible there may be an app bug, however I do not know
this for certain. We have manipulated the configuration of the daemon to try and
aleviate the symptoms to no avail. We experience the issues if we use a mysql
backend for the bayes db or not. We are experiencing misbehavior in the
application in the sense of the time it takes for messages to be processed. It
normally takes tenths of a second to process incoming mail, however we notice
the processing time jump to over 10 seconds each time the run queue drops to 0
and then drops back down to tenths of a second when the queue fills back up.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Runnable threads on run queue
  2006-07-08 22:19 ` Dr. David Alan Gilbert
@ 2006-07-08 23:08   ` Ask List
  0 siblings, 0 replies; 13+ messages in thread
From: Ask List @ 2006-07-08 23:08 UTC (permalink / raw)
  To: linux-kernel

I dont exactly have iostat -x 1 from the same time frame. But I do have the
collected sar data from before and during our period of high load. Here is 
a snippet before ...

                 tps      rtps      wtps   bread/s   bwrtn/s
11:46:00 PM      5.00      0.00      5.00      0.00    160.00
11:46:01 PM      8.91      0.00      8.91      0.00    182.18
11:46:02 PM     10.00      0.00     10.00      0.00    368.00
11:46:03 PM      7.92      0.00      7.92      0.00    142.57
11:46:04 PM     13.00      0.00     13.00      0.00    336.00
11:46:05 PM      9.90      0.00      9.90      0.00    261.39
11:46:06 PM      9.00      0.00      9.00      0.00    264.00
11:46:07 PM      6.93      0.00      6.93      0.00    198.02
11:46:08 PM      8.00      0.00      8.00      0.00    288.00
11:46:09 PM     12.87      0.00     12.87      0.00    324.75
11:46:10 PM      9.00      0.00      9.00      0.00    280.00
11:46:11 PM      6.93      0.00      6.93      0.00    134.65
11:46:12 PM     12.00      0.00     12.00      0.00    336.00
11:46:13 PM     10.89      0.00     10.89      0.00    253.47
11:46:14 PM     18.00      0.00     18.00      0.00    464.00
11:46:15 PM      4.81      0.00      4.81      0.00     84.62
11:46:16 PM     10.00      0.00     10.00      0.00    328.00
11:46:17 PM     10.89      0.00     10.89      0.00    269.31
11:46:18 PM     11.00      0.00     11.00      0.00    304.00
11:46:19 PM     30.69      0.00     30.69      0.00    451.49
11:46:20 PM      9.00      0.00      9.00      0.00    272.00
11:46:21 PM      5.94      0.00      5.94      0.00     95.05
11:46:22 PM     10.00      0.00     10.00      0.00    304.00
11:46:23 PM      5.94      0.00      5.94      0.00    150.50
11:46:24 PM     17.00      0.00     17.00      0.00    432.00
11:46:25 PM      6.93      0.00      6.93      0.00    190.10
11:46:26 PM     10.00      0.00     10.00      0.00    344.00
11:46:27 PM      8.91      0.00      8.91      0.00    166.34
11:46:28 PM      7.00      0.00      7.00      0.00    192.00
11:46:29 PM     15.84      0.00     15.84      0.00    427.72
11:46:30 PM      7.00      0.00      7.00      0.00    168.00
11:46:31 PM      9.90      0.00      9.90      0.00    221.78
11:46:32 PM     12.00      0.00     12.00      0.00    360.00
11:46:33 PM     10.89      0.00     10.89      0.00    245.54
11:46:34 PM     10.00      0.00     10.00      0.00    280.00
11:46:35 PM      6.93      0.00      6.93      0.00    134.65
11:46:36 PM     11.00      0.00     11.00      0.00    296.00
11:46:37 PM      8.91      0.00      8.91      0.00    205.94
11:46:38 PM     12.00      0.00     12.00      0.00    376.00
11:46:39 PM     14.85      0.00     14.85      0.00    435.64
11:46:40 PM      9.00      0.00      9.00      0.00    248.00
11:46:41 PM      7.92      0.00      7.92      0.00    237.62
11:46:42 PM     10.00      0.00     10.00      0.00    320.00
11:46:43 PM      5.94      0.00      5.94      0.00     55.45
11:46:44 PM     15.00      0.00     15.00      0.00    408.00
11:46:45 PM      9.90      0.00      9.90      0.00    229.70
11:46:46 PM     10.00      0.00     10.00      0.00    272.00
11:46:47 PM     10.89      0.00     10.89      0.00    269.31
11:46:48 PM     10.00      0.00     10.00      0.00    272.00
11:46:49 PM     36.63      0.00     36.63      0.00    514.85
11:46:50 PM     11.00      0.00     11.00      0.00    296.00
11:46:51 PM      8.91      0.00      8.91      0.00    205.94
11:46:52 PM     11.00      0.00     11.00      0.00    312.00
11:46:53 PM      8.91      0.00      8.91      0.00    190.10
11:46:54 PM     15.00      0.00     15.00      0.00    368.00
11:46:55 PM      9.90      0.00      9.90      0.00    253.47
11:46:56 PM     11.00      0.00     11.00      0.00    352.00
11:46:57 PM      8.91      0.00      8.91      0.00    245.54
11:46:58 PM      9.00      0.00      9.00      0.00    256.00
11:46:59 PM     11.88      0.00     11.88      0.00    308.91
11:47:00 PM      7.00      0.00      7.00      0.00    168.00

and here is a snippet during high load....

12:13:00 AM      6.00      0.00      6.00      0.00    224.00
12:13:01 AM      8.06      0.00      8.06      0.00    180.65
12:13:02 AM     18.00      0.00     18.00      0.00    544.00
12:13:03 AM      8.00      0.00      8.00      0.00    192.00
12:13:04 AM     47.00      0.00     47.00      0.00    856.00
12:13:05 AM      8.91      0.00      8.91      0.00    229.70
12:13:06 AM     15.00      0.00     15.00      0.00    392.00
12:13:07 AM      9.90      0.00      9.90      0.00    229.70
12:13:08 AM      8.00      0.00      8.00      0.00    232.00
12:13:09 AM     15.52      0.00     15.52      0.00    379.31
12:13:10 AM      6.98      0.00      6.98      0.00    198.45
12:13:12 AM     12.96      0.00     12.96      0.00    348.15
12:13:13 AM     17.00      0.00     17.00      0.00    424.00
12:13:14 AM     28.74      0.00     28.74      0.00    526.95
12:13:15 AM     13.46      0.00     13.46      0.00    361.54
12:13:16 AM      9.40      0.00      9.40      0.00    225.64
12:13:17 AM     15.00      0.00     15.00      0.00    488.00
12:13:19 AM     14.91      0.00     14.91      0.00    377.64
12:13:20 AM      9.00      0.00      9.00      0.00    296.00
12:13:21 AM     12.15      0.00     12.15      0.00    366.36
12:13:22 AM     26.00      0.00     26.00      0.00    784.00
12:13:24 AM     11.06      0.00     11.06      0.00    324.42
12:13:25 AM     14.81      0.00     14.81      0.00    333.33
12:13:26 AM     25.47      0.00     25.47      0.00    777.36
12:13:27 AM     19.00      0.00     19.00      0.00    480.00
12:13:28 AM     20.79      0.00     20.79      0.00    538.61
12:13:29 AM      5.00      0.00      5.00      0.00    136.00
12:13:31 AM     12.73      0.00     12.73      0.00    298.18
12:13:32 AM     23.00      0.00     23.00      0.00    632.00
12:13:33 AM     36.79      0.00     36.79      0.00   1011.32
12:13:34 AM     37.50      0.00     37.50      0.00    950.00
12:13:35 AM      7.76      0.00      7.76      0.00    186.21
12:13:36 AM     12.93      0.00     12.93      0.00    324.14
12:13:38 AM      8.57      0.00      8.57      0.00    210.29
12:13:39 AM     30.00      0.00     30.00      0.00    696.00
12:13:40 AM      9.90      0.00      9.90      0.00    245.54
12:13:41 AM     12.00      0.00     12.00      0.00    328.00
12:13:42 AM      5.94      0.00      5.94      0.00     63.37
12:13:43 AM      7.00      0.00      7.00      0.00    256.00
12:13:44 AM     44.54      0.00     44.54      0.00    746.22
12:13:45 AM      9.71      0.00      9.71      0.00    248.54
12:13:46 AM     13.89      0.00     13.89      0.00    370.37
12:13:47 AM     13.00      0.00     13.00      0.00    336.00
12:13:48 AM     13.86      0.00     13.86      0.00    324.75
12:13:49 AM     15.00      0.00     15.00      0.00    344.00
12:13:50 AM      3.96      0.00      3.96      0.00     39.60
12:13:51 AM     11.00      0.00     11.00      0.00    368.00
12:13:52 AM      7.92      0.00      7.92      0.00    174.26
12:13:54 AM     10.17      0.00     10.17      0.00    266.67
12:13:55 AM      7.41      0.00      7.41      0.00    133.33
12:13:56 AM     15.00      0.00     15.00      0.00    328.00
12:13:57 AM      5.71      0.00      5.71      0.00     91.43
12:13:58 AM      9.68      0.00      9.68      0.00    316.13
12:13:59 AM     24.27      0.00     24.27      0.00    520.39
12:14:00 AM     10.89      0.00     10.89      0.00    324.75

... I hope this helps.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Runnable threads on run queue
  2006-07-08 20:18 Ask List
  2006-07-08 21:18 ` Chase Venters
  2006-07-08 22:19 ` Dr. David Alan Gilbert
@ 2006-07-09  7:20 ` Mike Galbraith
  2006-07-09 23:38   ` Horst von Brand
  2006-07-12  4:14   ` Ask List
  2006-07-09  8:33 ` Rik van Riel
  3 siblings, 2 replies; 13+ messages in thread
From: Mike Galbraith @ 2006-07-09  7:20 UTC (permalink / raw)
  To: Ask List; +Cc: linux-kernel

On Sat, 2006-07-08 at 20:18 +0000, Ask List wrote:
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
> 83  0   1328 301684  37868 1520632    0    0     0   264  400  1332 98  2  0  0
> 17  0   1328 293936  37868 1520688    0    0     0     0  537   979 97  3  0  0
> 73  0   1328 293688  37868 1520712    0    0     0     0  268  2643 98  2  0  0
> 80  0   1328 277220  37868 1520756    0    0     0     0  351   824 98  2  0  0
> 49  0   1328 262452  37868 1520800    0    0     0     0  393  1882 97  3  0  0
> 45  0   1328 246796  37868 1520828    0    0     0   304  302  1631 96  4  0  0
> 55  0   1328 243852  37868 1520872    0    0     0     0  356  1101 99  1  0  0
> 17  0   1328 228672  37868 1520916    0    0     0     0  336   748 97  3  0  0
>  0  0   1328 299948  37868 1520956    0    0     0     0  299   821 78  3 19  0
>  0  0   1328 299184  37868 1520960    0    0     0     0  168    78  8  0 92  0
>  0  0   1328 299184  37868 1520960    0    0     0   248  173    38  0  1 99  0
>  0  0   1328 299184  37868 1520960    0    0     0     0  160    20  0  0 100  0
>  0  0   1328 299184  37868 1520960    0    0     0     0  151     6  0  0 100  0
>  0  0   1328 299184  37868 1520960    0    0     0     0  162    42  0  1 99  0
>  1  0   1328 299188  37868 1520960    0    0     0     0  161    24  0  0 100  0
>  0  0   1328 298808  37868 1520988    0    0     0   100  303  1119 57  0 42  0
>  0  0   1328 298808  37868 1520988    0    0     0     0  162    22  0  1 99  0

Looking at the interrupts column, I suspect you have a network problem,
not a scheduler problem.  Looks to me like your SpamAssasins are simply
running out of work to do because your network traffic comes in bursts.

	-Mike


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Runnable threads on run queue
  2006-07-08 20:18 Ask List
                   ` (2 preceding siblings ...)
  2006-07-09  7:20 ` Mike Galbraith
@ 2006-07-09  8:33 ` Rik van Riel
  2006-07-12  3:55   ` Ask List
  3 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2006-07-09  8:33 UTC (permalink / raw)
  To: Ask List; +Cc: linux-kernel

Ask List wrote:
> Have an issue maybe someone on this list can help with. 
> 
> At times of very high load the number of processes on the run queue drops to
>  0 then jumps really high and then drops to 0 and back and forth. It seems to
> last 10 seconds or so.

Are you using sendmail by any chance? :)

We start out with a low load averag, so sendmail forks as many
spamassassins as it can...

> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
> 83  0   1328 301684  37868 1520632    0    0     0   264  400  1332 98  2  0  0
> 17  0   1328 293936  37868 1520688    0    0     0     0  537   979 97  3  0  0
> 73  0   1328 293688  37868 1520712    0    0     0     0  268  2643 98  2  0  0
> 80  0   1328 277220  37868 1520756    0    0     0     0  351   824 98  2  0  0
> 49  0   1328 262452  37868 1520800    0    0     0     0  393  1882 97  3  0  0
> 45  0   1328 246796  37868 1520828    0    0     0   304  302  1631 96  4  0  0
> 55  0   1328 243852  37868 1520872    0    0     0     0  356  1101 99  1  0  0
> 17  0   1328 228672  37868 1520916    0    0     0     0  336   748 97  3  0  0
>  0  0   1328 299948  37868 1520956    0    0     0     0  299   821 78  3 19  0
>  0  0   1328 299184  37868 1520960    0    0     0     0  168    78  8  0 92  0

... and guess what?

The load average went through the roof, so sendmail stops forking
spamassassins.  Now nothing is running, and sendmail will not start
forking new spamassassins again until after the load average has
decayed to an acceptable level.

After that, it will fork way too many at once again, and the load
average will go through the roof.  Lather, rinse, repeat.

You'd probably be better off limiting the number of simultaneous
local mail deliveries to something reasonable, so the load average
always stays at an acceptable level - and more importantly, all of
the CPU capacity could be used if needed...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Runnable threads on run queue
       [not found] <fa.CQngdtRN/1xSBi2RLvhjLxBm1bE@ifi.uio.no>
@ 2006-07-09 16:11 ` Robert Hancock
  0 siblings, 0 replies; 13+ messages in thread
From: Robert Hancock @ 2006-07-09 16:11 UTC (permalink / raw)
  To: Ask List; +Cc: linux-kernel

Ask List wrote:
> Have an issue maybe someone on this list can help with. 
> 
> At times of very high load the number of processes on the run queue drops to
>  0 then jumps really high and then drops to 0 and back and forth. It seems to
> last 10 seconds or so. If you look at this vmstat you can see an example of 
> what I mean. Now im not a linux kernel expert but i am thinking it has 
> something to do with the scheduling algorithm and locking of the run queue. 
> For this particular application I need all available threads to be processed as
> fast as possible. Is there a way for me to elimnate this behavior or at least
> minimize the window in which there are no threads on the run queue? Is there a
> sysctl parameter I can use?
> 
> Please help.

This seems like a userspace issue to me. There is no way the scheduler 
would let the system sit idle for 10 seconds with runnable processes. I 
think Rik van Riel's comment about sendmail reacting to increased load 
average may be related to what's going on here.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Runnable threads on run queue
  2006-07-09  7:20 ` Mike Galbraith
@ 2006-07-09 23:38   ` Horst von Brand
  2006-07-12  4:14   ` Ask List
  1 sibling, 0 replies; 13+ messages in thread
From: Horst von Brand @ 2006-07-09 23:38 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Ask List, linux-kernel

Mike Galbraith <efault@gmx.de> wrote:
> On Sat, 2006-07-08 at 20:18 +0000, Ask List wrote:
> > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa

[...]

> Looking at the interrupts column, I suspect you have a network problem,
> not a scheduler problem.  Looks to me like your SpamAssasins are simply
> running out of work to do because your network traffic comes in bursts.

spamassassin acted up here some time ago. With personal training and some
messages it went to a loop and the load went through the roof. Couldn't
find a cure, plus some hundred users with large personalized rule files
were causing problems anyway, so we axed that.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Runnable threads on run queue
  2006-07-09  8:33 ` Rik van Riel
@ 2006-07-12  3:55   ` Ask List
  0 siblings, 0 replies; 13+ messages in thread
From: Ask List @ 2006-07-12  3:55 UTC (permalink / raw)
  To: linux-kernel

We are not running sendmail. We developed our own mail server in-house. We have
a cluster of these mail servers sending spam traffic to a cluster of SA servers
and we use the round-robin parameter when starting the spamd process and start
the daemon with a ton of min/spare/max children. So we dont see the forking
issue you mention. 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Runnable threads on run queue
  2006-07-09  7:20 ` Mike Galbraith
  2006-07-09 23:38   ` Horst von Brand
@ 2006-07-12  4:14   ` Ask List
  2006-07-12  5:40     ` Mike Galbraith
  1 sibling, 1 reply; 13+ messages in thread
From: Ask List @ 2006-07-12  4:14 UTC (permalink / raw)
  To: linux-kernel

Mike Galbraith <efault <at> gmx.de> writes:
...
> Looking at the interrupts column, I suspect you have a network problem,
> not a scheduler problem.  Looks to me like your SpamAssasins are simply
> running out of work to do because your network traffic comes in bursts.
> 
> 	-Mike
> 
> 

Network Problem? So your saying our mail servers are not sending spam traffic
fast enough if spam assassin processes are running out of work to do? So when
our mail servers are not sending spam traffic we see our cpu,cs,interrupts, &
runnable threads drop ...?

I'd really like to believe this is true, however in the sa logs there are still
plenty of B (busy threads)...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Runnable threads on run queue
  2006-07-12  4:14   ` Ask List
@ 2006-07-12  5:40     ` Mike Galbraith
  2006-07-13 19:05       ` Ask List
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Galbraith @ 2006-07-12  5:40 UTC (permalink / raw)
  To: Ask List; +Cc: linux-kernel

On Wed, 2006-07-12 at 04:14 +0000, Ask List wrote:
> Network Problem? So your saying our mail servers are not sending spam traffic
> fast enough if spam assassin processes are running out of work to do? So when
> our mail servers are not sending spam traffic we see our cpu,cs,interrupts, &
> runnable threads drop ...?

More or less, yes.  I think somebody is dropping the communication ball.

	-Mike


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Runnable threads on run queue
  2006-07-12  5:40     ` Mike Galbraith
@ 2006-07-13 19:05       ` Ask List
  0 siblings, 0 replies; 13+ messages in thread
From: Ask List @ 2006-07-13 19:05 UTC (permalink / raw)
  To: linux-kernel


I'll look into it. Thanks for the input.





^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-07-13 19:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <fa.CQngdtRN/1xSBi2RLvhjLxBm1bE@ifi.uio.no>
2006-07-09 16:11 ` Runnable threads on run queue Robert Hancock
2006-07-08 20:18 Ask List
2006-07-08 21:18 ` Chase Venters
2006-07-08 22:54   ` Ask List
2006-07-08 22:19 ` Dr. David Alan Gilbert
2006-07-08 23:08   ` Ask List
2006-07-09  7:20 ` Mike Galbraith
2006-07-09 23:38   ` Horst von Brand
2006-07-12  4:14   ` Ask List
2006-07-12  5:40     ` Mike Galbraith
2006-07-13 19:05       ` Ask List
2006-07-09  8:33 ` Rik van Riel
2006-07-12  3:55   ` Ask List

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox