public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: spurious temporary freeze, scheduler and preempt problem ?
@ 2004-02-18 15:26 Brice Figureau
  2004-02-18 18:12 ` Eric
  0 siblings, 1 reply; 4+ messages in thread
From: Brice Figureau @ 2004-02-18 15:26 UTC (permalink / raw)
  To: linux-kernel

Hi all,

I'm working on a java server that uses some native libs (aka JNI). The
server is poll(2)ing a TCP socket with the help of NIO, and dispatch
network requests to a pool of a threads (currently 4) that send back
responses to the client. Those threads are mostly executing native code
(with JNI). The server is using a mysql engine running on the same host
to store a few things (about 10 records stored per seconds if that
matters).

I'm running 2.6.3-rc2 on a UP Celeron (same problem on a PIII too)
system with plenty of RAM, but which runs several other daemons
(postfix, djbdns... those daemons were not loading the machine during
the tests).
The machine was never swapping (see the vmstat output below [2,3]).

I noticed a few weeks ago that when I try to stress-test this particular
application the host on which it runs completely freeze for 3 to 5
seconds, nothing seems to run at all during this time (this was
especially visible during my first test as it was my devl computer
running X and the whole screen was frozen).

I decided to perform torough testing running on quiet machine without X.
During my testbed I tried to capture most data I could (vmstat,
profiling...).

What I've seen is the following:

1) During the freeze, the system was using almost 80% CPU time in kernel
and the remaining in user space

2) During the freeze, even changing from virtual-console to another
could have been frozen. No virtual-console would be alive.

3) a top d 1 or top d 0.5 show that my application was running before
the freeze and after the freeze, with 80% of CPU time in kernel space.

4) load was increasing faster than I could think (>19 after 30 seconds)

I first thought that a profiling (with profile=1 or oprofile) would give
me a good candidate about the problem, but I'm not sure right now (see
the profile output at the end of the mail [4,5])

After that I tried to run vmstat 1 to have a better idea of the problem
[2,3]. You can see the high amount of context-switch and interrupt. I
think that this is because vmstat was stuck (like the other process)
during a few seconds, but the kernel counters were still incrementing.

Then I finally tried several differents kernel and found the following
results:

2.6.3-rc2  -> exhibit the high load problem
2.4.25-rc3 -> immune to the problem, the load never goes beyong 1.20

I finally did a binary search between several version of 2.6 and found
that any kernel *before* 2.6.0-test5 (included) are immune to the
problem and any kernel *after* 2.6.0-test6 (included) have the same load
problem.

On a side note, 2.6.0-test4-mm2 (the only mm kernel I had already
compiled) had the problem too.

So what are the differences between test5 and test6 ? 
It seems that the sched_interactive patch from Con Kolivas was included
in test6, can it be the cause ?

By running and looking at a 'top d 0.5' I couldn't spot any bizarre
dynamic priorities (my java app is at 15 or 16, never more).
The only difference I could spot is that kernel with interactive
scheduler seemed to make runnable several java threads at the same time
(about 3 to 4) unlike the previous scheduler (only 1~2 runnable process
at any time).

Something I should also say is that I tested a 2.6.3-rc2 *without*
preempt and found this kernel to be immune, too. 

I also removed all the sql code in my application and no more freeze !
(Although that does not explain why the combination of the two gives
those results).

Can someone help me find what the problem is ?

Useful data:

1) ver_linux
Linux arsenic 2.6.3-rc2 #2 SMP Wed Feb 18 11:35:24 CET 2004 i686 unknown
unknown GNU/Linux
  
Gnu C                  3.3.1
Gnu make               3.80
util-linux             2.12
mount                  2.11z
module-init-tools      0.9.11a
e2fsprogs              1.34
quota-tools            3.09.
PPP                    2.4.1
nfs-utils              1.0.5
Linux C Library        2.3.2
Dynamic linker (ldd)   2.3.2
Procps                 3.1.11
Net-tools              1.60
Console-tools          0.2.3
Sh-utils               5.0
Modules Loaded


2) vmstat for a kernel with the problem:

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 3  0      0 371136   8044  61096    0    0    16   120 1035   351 63  3 30  4
 3  0      0 371136   8044  61112    0    0    12     0 1033   532 78  4 18  0
21  0      0 371024   8064  61116    0    0     0   148 9038 630633 29 71  0  0
 4  0      0 371024   8064  61120    0    0     0     0 1064 43014 58 42  0  0
 4  0      0 371024   8064  61120    0    0     0     0 1159 78750 38 62  0  0
 4  0      0 371040   8080  61120    0    0     0    76 4298 309703 29 71  0  0
15  0      0 370976   8096  61120    0    0     0    96 6833 501193 27 73  0  0
 3  0      0 370992   8096  61124    0    0     0     0 1094 44694 56 44  0  0
 3  0      0 370992   8096  61124    0    0     0     0 1087 65851 42 58  0  0
16  0      0 371024   8112  61124    0    0     0    76 10511 794459 26 74  0  0
 4  0      0 371024   8112  61124    0    0     0     0 1035 56591 50 50  0  0
 3  0      0 371024   8112  61124    0    0     0     0 1025 63930 41 59  0  0
 3  0      0 371024   8112  61124    0    0     0     0 1007 54424 47 53  0  0
 5  0      0 371024   8112  61124    0    0     0     0 1100 56909 53 47  0  0
 5  0      0 370960   8128  61128    0    0     0    60 1154 71518 37 63  0  0
 4  0      0 370960   8128  61128    0    0     0     0 1041 54728 51 49  0  0
12  0      0 370960   8128  61128    0    0     0    12 3147 223442 31 69  0  0

huge freeze before the following line
15  0      0 370952   8152  61128    0    0     0   112 13393 1014908 25 75  0  0
 3  0      0 370944   8152  61128    0    0     0     0 1006 55046 43 57  0  0
 3  0      0 370944   8152  61128    0    0     0     0 1111 66487 40 60  0  0
21  0      0 370920   8168  61128    0    0     0    68 20308 1531575 26 74  0  0
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
14  0      0 370888   8176  61132    0    0     0    20 4228 313614 25 75  0  0
18  0      0 370912   8200  61132    0    0     0   112 8133 610932 26 74  0  0
 8  0      0 371488   8200  61132    0    0     0    48 1513 104797 31 69  0  0
29  0      0 371440   8216  61132    0    0     0    68 4490 333022 26 74  0  0

3) vmstat for a kernel without the problem

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  0      0 389060   7884  57472    0    0   153    38 1023    91  5  3 85  7
 2  0      0 388020   7884  57492    0    0     4     0 1012   127  3  3 92  2
 2  0      0 380940   7884  57492    0    0     0     0 1010   440 94  6  0  0
 0  1      0 378636   7928  58460    0    0  1004    16 1173   330 47  5 21 26
 0  0      0 378276   7928  58600    0    0   136     0 1038   291 68 10 11 11
 3  0      0 377828   7944  58652    0    0    52   268 1021   261 68 10 17  5
 3  0      0 377188   7952  58768    0    0   112    24 1017   278 63  7 24  6
 2  0      0 377060   7952  58828    0    0    56     0 1016   276 57  7 27  9
 3  0      0 377060   7960  58864    0    0    36    88 1020   223 65  8 23  4
 0  0      0 376868   7960  58896    0    0    28     0 1009   234 66 10 23  1
 3  0      0 376820   7968  58936    0    0    36   104 1032   229 61  9 26  4
 3  0      0 376756   7968  58972    0    0    32     0 1010   274 47  6 43  3
 0  0      0 376564   7968  59000    0    0    28     0 1009   415 66  3 30  1
 5  0      0 376308   7976  59012    0    0     8    92 1016   387 56  5 39  0
 2  0      0 376236   7976  59080    0    0    68     0 1004   290 78  1 21  0
 4  0      0 375980   7984  59088    0    0     4   204 1056   408 45  5 50  0
 1  0      0 375788   7984  59092    0    0     4     0 1003   330 87  4  9  0
 0  0      0 375596   7984  59096    0    0     0     0 1002   281 74  3 23  0
 0  0      0 375404   7992  59108    0    0     8   120 1024   289 43  4 39 15
 1  0      0 375276   7992  59124    0    0    12     0 1005   394 39  3 57  1
 4  0      0 375204   8000  59132    0    0     4    16 1006  1083 95  5  0  0
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 3  0      0 375140   8000  59132    0    0     0     0 1002 95433 37 63  0  0
 3  0      0 375060   8000  59136    0    0     0     0 1012 102262 33 67  0  0
 6  0      0 375044   8008  59136    0    0     0   112 1025 83959 43 57  0  0
 5  0      0 374988   8008  59136    0    0     0     0 1002   232 98  2  0  0
 3  0      0 375052   8016  59136    0    0     0    20 1005   211 98  2  0  0
 3  0      0 375052   8016  59136    0    0     0     0 1007 43272 71 29  0  0
 2  0      0 375052   8016  59140    0    0     0     0 1023 82448 59 41  0  0
 3  0      0 374988   8024  59140    0    0     0    84 1022 132541 39 61  0  0
 3  0      0 374988   8024  59140    0    0     0     0 1002 21207 89 11  0  0
 5  0      0 375004   8032  59144    0    0     0    20 1005 45680 75 25  0  0
 3  0      0 375004   8032  59144    0    0     0     0 1012 24392 82 18  0  0
 3  0      0 375004   8032  59144    0    0     0     0 1015 82373 48 52  0  0
 2  0      0 375004   8032  59144    0    0     0     0 1005 77767 61 39  0  0
 4  0      0 375020   8040  59144    0    0     0    68 1009 148047 31 69  0  0
 3  0      0 375020   8048  59144    0    0     0    32 1008 127918 38 62  0  0
 1  0      0 374956   8048  59144    0    0     0     0 1011 91847 39 61  0  0
 3  0      0 374956   8048  59148    0    0     0     0 1002 94842 51 49  0  0
 4  0      0 374972   8048  59148    0    0     0     0 1006 126960 38 62  0  0
 2  0      0 374972   8056  59148    0    0     0    60 1009 76808 65 35  0  0
 1  0      0 374972   8064  59148    0    0     0   184 1037 111781 32 68  0  0
 2  0      0 374972   8064  59148    0    0     0     0 1002 120766 36 64  0  0

4) readprofile for a kernel with the problem (analysis during atmost 1min)

c0191b55 proc_pid_stat                                14   0.0108
c01936cc write_profile                                15   0.0794
c0241e40 do_con_write                                 15   0.0081
c0130e95 sys_tgkill                                   16   0.0438
c0176dd2 __d_lookup                                   18   0.0496
c018cf88 task_statm                                   18   0.1818
c02037d2 __copy_user_intel                            20   0.1163
c02025f3 number                                       28   0.0452
c014e3a8 do_anonymous_page                            32   0.0516
c012eab4 kill_something_info                         175   0.5591
c0109480 syscall_call                                239  21.7273
c01354ad find_task_by_pid                            280   3.6364
c012dd86 check_kill_permission                       316   1.8588
c010948b syscall_exit                                336  30.5455
c0109f34 device_not_available                        360   8.5714
c0111313 restore_i387_fxsave                         403   2.4132
c01209af __might_sleep                               625   3.0340
c01113ba restore_i387                                673   4.6414
c010870a restore_sigcontext                          753   2.4770
c0130e43 sys_kill                                    757   9.2317
c0108a73 setup_sigcontext                            935   3.1481
c0114ffa sched_clock                                1189   9.0076
c0111111 save_i387_fxsave                           1214   4.8367
c011120c save_i387                                  1872   7.1179
c0111027 convert_fxsr_from_user                     1974   8.4359
c012ea44 kill_proc_info                             2217  19.7946
c0130613 sys_rt_sigprocmask                         2220   5.2857
c0108b9c setup_frame                                2250   4.8077
c010883a sys_sigreturn                              3405  13.3008
c0110eca convert_fxsr_to_user                       3620  10.3725
c0109009 handle_signal                              3927  11.6183
c01084cc sys_rt_sigsuspend                          4121  13.4235
c020399a __copy_from_user_ll                        6233  54.6754
c010915b do_signal                                  6656  24.7435
c020392a __copy_to_user_ll                          7219  64.4554
c01300d4 get_signal_to_deliver                      9293   8.6447
c012e484 group_send_sig_info                       14040  13.2578
c011e5a1 schedule                                  15029   8.2396
c0109454 system_call                               21609 491.1136
00000000 total                                    114435   0.0412

5) profile for a kernel without the problem (in fact a 2.6.3-rc2 without preempt)

c027983c vgacon_cursor                                24   0.0612
c0108d64 syscall_call                                 25   2.2727
c013a306 do_anonymous_page                            25   0.0627
c016e47c pid_revalidate                               26   0.1486
c01d7a86 __copy_user_intel                            28   0.1628
c010f2ed restore_i387_fxsave                          31   0.2138
c0127742 find_task_by_pid                             31   0.4026
c020d598 conv_uni_to_pc                               34   0.1932
c02c2b7f tcp_poll                                     36   0.0945
c010f17d save_i387_fxsave                             40   0.2062
c0153114 link_path_walk                               43   0.0205
c016edac get_tgid_list                                47   0.3072
c0147da7 fget_light                                   49   0.4949
c015c21f __d_lookup                                   51   0.2099
c021308d do_con_write                                 53   0.0287
c0123e02 sys_kill                                     57   0.6951
c010842a setup_sigcontext                             59   0.1987
c01d6bba vsnprintf                                    63   0.0510
c0108144 restore_sigcontext                           66   0.2171
c01702da proc_pid_stat                                69   0.0681
c0116bed __might_sleep                                70   0.4516
c0108553 setup_frame                                  72   0.1538
c016c9de task_statm                                   83   0.8384
c0112a5e sched_clock                                  90   0.6818
c0108274 sys_sigreturn                                93   0.4794
c01236d1 sys_rt_sigprocmask                           99   0.2750
c01089c0 handle_signal                               122   0.4342
c010f093 convert_fxsr_from_user                      143   0.6111
c0107f42 sys_rt_sigsuspend                           146   0.5911
c01d694f number                                      157   0.2536
c010ef36 convert_fxsr_to_user                        179   0.5129
c0108ad9 do_signal                                   463   1.8016
c01d7c4e __copy_from_user_ll                         476   4.1754
c0123300 get_signal_to_deliver                       522   0.6788
c0121d45 group_send_sig_info                         646   0.7242
c01d7bde __copy_to_user_ll                           695   6.2054
c011577c schedule                                    889   0.6679
c0108d38 system_call                                1642  37.3182
c0106d59 default_idle                              53370 1241.1628
00000000 total                                     61980   0.0289

6) Sysreq-T during a freeze available on request.
7) .config or other info available on request

Thank you for your help.
PS: I'm not subscribed to the list, please CC: me if you answer.

--
Brice Figureau



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PROBLEM: spurious temporary freeze, scheduler and preempt problem ?
  2004-02-18 15:26 PROBLEM: spurious temporary freeze, scheduler and preempt problem ? Brice Figureau
@ 2004-02-18 18:12 ` Eric
  2004-02-19  0:27   ` Nick Piggin
  0 siblings, 1 reply; 4+ messages in thread
From: Eric @ 2004-02-18 18:12 UTC (permalink / raw)
  To: Brice Figureau, linux-kernel

On Wednesday 18 February 2004 9:26 am, Brice Figureau wrote:
Someone correct me if I'm way off base, but have you tried
	scheduler=deadline
To try the different kernel schedulers?
Im not sure about the keyword, but do a quick google or search lkml and you'll 
see the correct keyword and the different schedulers you can try.


> Hi all,
>
> I'm working on a java server that uses some native libs (aka JNI). The
> server is poll(2)ing a TCP socket with the help of NIO, and dispatch
> network requests to a pool of a threads (currently 4) that send back
> responses to the client. Those threads are mostly executing native code
> (with JNI). The server is using a mysql engine running on the same host
> to store a few things (about 10 records stored per seconds if that
> matters).
>
> I'm running 2.6.3-rc2 on a UP Celeron (same problem on a PIII too)
> system with plenty of RAM, but which runs several other daemons
> (postfix, djbdns... those daemons were not loading the machine during
> the tests).
> The machine was never swapping (see the vmstat output below [2,3]).
>
> I noticed a few weeks ago that when I try to stress-test this particular
> application the host on which it runs completely freeze for 3 to 5
> seconds, nothing seems to run at all during this time (this was
> especially visible during my first test as it was my devl computer
> running X and the whole screen was frozen).
>
> I decided to perform torough testing running on quiet machine without X.
> During my testbed I tried to capture most data I could (vmstat,
> profiling...).
>
> What I've seen is the following:
>
> 1) During the freeze, the system was using almost 80% CPU time in kernel
> and the remaining in user space
>
> 2) During the freeze, even changing from virtual-console to another
> could have been frozen. No virtual-console would be alive.
>
> 3) a top d 1 or top d 0.5 show that my application was running before
> the freeze and after the freeze, with 80% of CPU time in kernel space.
>
> 4) load was increasing faster than I could think (>19 after 30 seconds)
>
> I first thought that a profiling (with profile=1 or oprofile) would give
> me a good candidate about the problem, but I'm not sure right now (see
> the profile output at the end of the mail [4,5])
>
> After that I tried to run vmstat 1 to have a better idea of the problem
> [2,3]. You can see the high amount of context-switch and interrupt. I
> think that this is because vmstat was stuck (like the other process)
> during a few seconds, but the kernel counters were still incrementing.
>
> Then I finally tried several differents kernel and found the following
> results:
>
> 2.6.3-rc2  -> exhibit the high load problem
> 2.4.25-rc3 -> immune to the problem, the load never goes beyong 1.20
>
> I finally did a binary search between several version of 2.6 and found
> that any kernel *before* 2.6.0-test5 (included) are immune to the
> problem and any kernel *after* 2.6.0-test6 (included) have the same load
> problem.
>
> On a side note, 2.6.0-test4-mm2 (the only mm kernel I had already
> compiled) had the problem too.
>
> So what are the differences between test5 and test6 ?
> It seems that the sched_interactive patch from Con Kolivas was included
> in test6, can it be the cause ?
>
> By running and looking at a 'top d 0.5' I couldn't spot any bizarre
> dynamic priorities (my java app is at 15 or 16, never more).
> The only difference I could spot is that kernel with interactive
> scheduler seemed to make runnable several java threads at the same time
> (about 3 to 4) unlike the previous scheduler (only 1~2 runnable process
> at any time).
>
> Something I should also say is that I tested a 2.6.3-rc2 *without*
> preempt and found this kernel to be immune, too.
>
> I also removed all the sql code in my application and no more freeze !
> (Although that does not explain why the combination of the two gives
> those results).
>
> Can someone help me find what the problem is ?
>
> Useful data:
>
> 1) ver_linux
> Linux arsenic 2.6.3-rc2 #2 SMP Wed Feb 18 11:35:24 CET 2004 i686 unknown
> unknown GNU/Linux
>
> Gnu C                  3.3.1
> Gnu make               3.80
> util-linux             2.12
> mount                  2.11z
> module-init-tools      0.9.11a
> e2fsprogs              1.34
> quota-tools            3.09.
> PPP                    2.4.1
> nfs-utils              1.0.5
> Linux C Library        2.3.2
> Dynamic linker (ldd)   2.3.2
> Procps                 3.1.11
> Net-tools              1.60
> Console-tools          0.2.3
> Sh-utils               5.0
> Modules Loaded
>
>
> 2) vmstat for a kernel with the problem:
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> ----cpu---- r  b   swpd   free   buff  cache   si   so    bi    bo   in   
> cs us sy id wa 3  0      0 371136   8044  61096    0    0    16   120 1035 
>  351 63  3 30  4 3  0      0 371136   8044  61112    0    0    12     0
> 1033   532 78  4 18  0 21  0      0 371024   8064  61116    0    0     0  
> 148 9038 630633 29 71  0  0 4  0      0 371024   8064  61120    0    0    
> 0     0 1064 43014 58 42  0  0 4  0      0 371024   8064  61120    0    0  
>   0     0 1159 78750 38 62  0  0 4  0      0 371040   8080  61120    0    0
>     0    76 4298 309703 29 71  0  0 15  0      0 370976   8096  61120    0 
>   0     0    96 6833 501193 27 73  0  0 3  0      0 370992   8096  61124   
> 0    0     0     0 1094 44694 56 44  0  0 3  0      0 370992   8096  61124 
>   0    0     0     0 1087 65851 42 58  0  0 16  0      0 371024   8112 
> 61124    0    0     0    76 10511 794459 26 74  0  0 4  0      0 371024  
> 8112  61124    0    0     0     0 1035 56591 50 50  0  0 3  0      0 371024
>   8112  61124    0    0     0     0 1025 63930 41 59  0  0 3  0      0
> 371024   8112  61124    0    0     0     0 1007 54424 47 53  0  0 5  0     
> 0 371024   8112  61124    0    0     0     0 1100 56909 53 47  0  0 5  0   
>   0 370960   8128  61128    0    0     0    60 1154 71518 37 63  0  0 4  0 
>     0 370960   8128  61128    0    0     0     0 1041 54728 51 49  0  0 12 
> 0      0 370960   8128  61128    0    0     0    12 3147 223442 31 69  0  0
>
> huge freeze before the following line
> 15  0      0 370952   8152  61128    0    0     0   112 13393 1014908 25 75
>  0  0 3  0      0 370944   8152  61128    0    0     0     0 1006 55046 43
> 57  0  0 3  0      0 370944   8152  61128    0    0     0     0 1111 66487
> 40 60  0  0 21  0      0 370920   8168  61128    0    0     0    68 20308
> 1531575 26 74  0  0 procs -----------memory---------- ---swap-- -----io----
> --system-- ----cpu---- r  b   swpd   free   buff  cache   si   so    bi   
> bo   in    cs us sy id wa 14  0      0 370888   8176  61132    0    0     0
>    20 4228 313614 25 75  0  0 18  0      0 370912   8200  61132    0    0  
>   0   112 8133 610932 26 74  0  0 8  0      0 371488   8200  61132    0   
> 0     0    48 1513 104797 31 69  0  0 29  0      0 371440   8216  61132   
> 0    0     0    68 4490 333022 26 74  0  0
>
> 3) vmstat for a kernel without the problem
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> ----cpu---- r  b   swpd   free   buff  cache   si   so    bi    bo   in   
> cs us sy id wa 0  0      0 389060   7884  57472    0    0   153    38 1023 
>   91  5  3 85  7 2  0      0 388020   7884  57492    0    0     4     0
> 1012   127  3  3 92  2 2  0      0 380940   7884  57492    0    0     0    
> 0 1010   440 94  6  0  0 0  1      0 378636   7928  58460    0    0  1004  
>  16 1173   330 47  5 21 26 0  0      0 378276   7928  58600    0    0   136
>     0 1038   291 68 10 11 11 3  0      0 377828   7944  58652    0    0   
> 52   268 1021   261 68 10 17  5 3  0      0 377188   7952  58768    0    0 
>  112    24 1017   278 63  7 24  6 2  0      0 377060   7952  58828    0   
> 0    56     0 1016   276 57  7 27  9 3  0      0 377060   7960  58864    0 
>   0    36    88 1020   223 65  8 23  4 0  0      0 376868   7960  58896   
> 0    0    28     0 1009   234 66 10 23  1 3  0      0 376820   7968  58936 
>   0    0    36   104 1032   229 61  9 26  4 3  0      0 376756   7968 
> 58972    0    0    32     0 1010   274 47  6 43  3 0  0      0 376564  
> 7968  59000    0    0    28     0 1009   415 66  3 30  1 5  0      0 376308
>   7976  59012    0    0     8    92 1016   387 56  5 39  0 2  0      0
> 376236   7976  59080    0    0    68     0 1004   290 78  1 21  0 4  0     
> 0 375980   7984  59088    0    0     4   204 1056   408 45  5 50  0 1  0   
>   0 375788   7984  59092    0    0     4     0 1003   330 87  4  9  0 0  0 
>     0 375596   7984  59096    0    0     0     0 1002   281 74  3 23  0 0 
> 0      0 375404   7992  59108    0    0     8   120 1024   289 43  4 39 15
> 1  0      0 375276   7992  59124    0    0    12     0 1005   394 39  3 57 
> 1 4  0      0 375204   8000  59132    0    0     4    16 1006  1083 95  5 
> 0  0 procs -----------memory---------- ---swap-- -----io---- --system--
> ----cpu---- r  b   swpd   free   buff  cache   si   so    bi    bo   in   
> cs us sy id wa 3  0      0 375140   8000  59132    0    0     0     0 1002
> 95433 37 63  0  0 3  0      0 375060   8000  59136    0    0     0     0
> 1012 102262 33 67  0  0 6  0      0 375044   8008  59136    0    0     0  
> 112 1025 83959 43 57  0  0 5  0      0 374988   8008  59136    0    0     0
>     0 1002   232 98  2  0  0 3  0      0 375052   8016  59136    0    0    
> 0    20 1005   211 98  2  0  0 3  0      0 375052   8016  59136    0    0  
>   0     0 1007 43272 71 29  0  0 2  0      0 375052   8016  59140    0    0
>     0     0 1023 82448 59 41  0  0 3  0      0 374988   8024  59140    0   
> 0     0    84 1022 132541 39 61  0  0 3  0      0 374988   8024  59140    0
>    0     0     0 1002 21207 89 11  0  0 5  0      0 375004   8032  59144   
> 0    0     0    20 1005 45680 75 25  0  0 3  0      0 375004   8032  59144 
>   0    0     0     0 1012 24392 82 18  0  0 3  0      0 375004   8032 
> 59144    0    0     0     0 1015 82373 48 52  0  0 2  0      0 375004  
> 8032  59144    0    0     0     0 1005 77767 61 39  0  0 4  0      0 375020
>   8040  59144    0    0     0    68 1009 148047 31 69  0  0 3  0      0
> 375020   8048  59144    0    0     0    32 1008 127918 38 62  0  0 1  0    
>  0 374956   8048  59144    0    0     0     0 1011 91847 39 61  0  0 3  0  
>    0 374956   8048  59148    0    0     0     0 1002 94842 51 49  0  0 4  0
>      0 374972   8048  59148    0    0     0     0 1006 126960 38 62  0  0 2
>  0      0 374972   8056  59148    0    0     0    60 1009 76808 65 35  0  0
> 1  0      0 374972   8064  59148    0    0     0   184 1037 111781 32 68  0
>  0 2  0      0 374972   8064  59148    0    0     0     0 1002 120766 36 64
>  0  0
>
> 4) readprofile for a kernel with the problem (analysis during atmost 1min)
>
> c0191b55 proc_pid_stat                                14   0.0108
> c01936cc write_profile                                15   0.0794
> c0241e40 do_con_write                                 15   0.0081
> c0130e95 sys_tgkill                                   16   0.0438
> c0176dd2 __d_lookup                                   18   0.0496
> c018cf88 task_statm                                   18   0.1818
> c02037d2 __copy_user_intel                            20   0.1163
> c02025f3 number                                       28   0.0452
> c014e3a8 do_anonymous_page                            32   0.0516
> c012eab4 kill_something_info                         175   0.5591
> c0109480 syscall_call                                239  21.7273
> c01354ad find_task_by_pid                            280   3.6364
> c012dd86 check_kill_permission                       316   1.8588
> c010948b syscall_exit                                336  30.5455
> c0109f34 device_not_available                        360   8.5714
> c0111313 restore_i387_fxsave                         403   2.4132
> c01209af __might_sleep                               625   3.0340
> c01113ba restore_i387                                673   4.6414
> c010870a restore_sigcontext                          753   2.4770
> c0130e43 sys_kill                                    757   9.2317
> c0108a73 setup_sigcontext                            935   3.1481
> c0114ffa sched_clock                                1189   9.0076
> c0111111 save_i387_fxsave                           1214   4.8367
> c011120c save_i387                                  1872   7.1179
> c0111027 convert_fxsr_from_user                     1974   8.4359
> c012ea44 kill_proc_info                             2217  19.7946
> c0130613 sys_rt_sigprocmask                         2220   5.2857
> c0108b9c setup_frame                                2250   4.8077
> c010883a sys_sigreturn                              3405  13.3008
> c0110eca convert_fxsr_to_user                       3620  10.3725
> c0109009 handle_signal                              3927  11.6183
> c01084cc sys_rt_sigsuspend                          4121  13.4235
> c020399a __copy_from_user_ll                        6233  54.6754
> c010915b do_signal                                  6656  24.7435
> c020392a __copy_to_user_ll                          7219  64.4554
> c01300d4 get_signal_to_deliver                      9293   8.6447
> c012e484 group_send_sig_info                       14040  13.2578
> c011e5a1 schedule                                  15029   8.2396
> c0109454 system_call                               21609 491.1136
> 00000000 total                                    114435   0.0412
>
> 5) profile for a kernel without the problem (in fact a 2.6.3-rc2 without
> preempt)
>
> c027983c vgacon_cursor                                24   0.0612
> c0108d64 syscall_call                                 25   2.2727
> c013a306 do_anonymous_page                            25   0.0627
> c016e47c pid_revalidate                               26   0.1486
> c01d7a86 __copy_user_intel                            28   0.1628
> c010f2ed restore_i387_fxsave                          31   0.2138
> c0127742 find_task_by_pid                             31   0.4026
> c020d598 conv_uni_to_pc                               34   0.1932
> c02c2b7f tcp_poll                                     36   0.0945
> c010f17d save_i387_fxsave                             40   0.2062
> c0153114 link_path_walk                               43   0.0205
> c016edac get_tgid_list                                47   0.3072
> c0147da7 fget_light                                   49   0.4949
> c015c21f __d_lookup                                   51   0.2099
> c021308d do_con_write                                 53   0.0287
> c0123e02 sys_kill                                     57   0.6951
> c010842a setup_sigcontext                             59   0.1987
> c01d6bba vsnprintf                                    63   0.0510
> c0108144 restore_sigcontext                           66   0.2171
> c01702da proc_pid_stat                                69   0.0681
> c0116bed __might_sleep                                70   0.4516
> c0108553 setup_frame                                  72   0.1538
> c016c9de task_statm                                   83   0.8384
> c0112a5e sched_clock                                  90   0.6818
> c0108274 sys_sigreturn                                93   0.4794
> c01236d1 sys_rt_sigprocmask                           99   0.2750
> c01089c0 handle_signal                               122   0.4342
> c010f093 convert_fxsr_from_user                      143   0.6111
> c0107f42 sys_rt_sigsuspend                           146   0.5911
> c01d694f number                                      157   0.2536
> c010ef36 convert_fxsr_to_user                        179   0.5129
> c0108ad9 do_signal                                   463   1.8016
> c01d7c4e __copy_from_user_ll                         476   4.1754
> c0123300 get_signal_to_deliver                       522   0.6788
> c0121d45 group_send_sig_info                         646   0.7242
> c01d7bde __copy_to_user_ll                           695   6.2054
> c011577c schedule                                    889   0.6679
> c0108d38 system_call                                1642  37.3182
> c0106d59 default_idle                              53370 1241.1628
> 00000000 total                                     61980   0.0289
>
> 6) Sysreq-T during a freeze available on request.
> 7) .config or other info available on request
>
> Thank you for your help.
> PS: I'm not subscribed to the list, please CC: me if you answer.
>
> --
> Brice Figureau
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
-------------------------
Eric Bambach
Eric at cisu dot net
-------------------------

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PROBLEM: spurious temporary freeze, scheduler and preempt problem ?
  2004-02-18 18:12 ` Eric
@ 2004-02-19  0:27   ` Nick Piggin
  2004-02-19  9:41     ` Brice Figureau
  0 siblings, 1 reply; 4+ messages in thread
From: Nick Piggin @ 2004-02-19  0:27 UTC (permalink / raw)
  To: Eric; +Cc: Brice Figureau, linux-kernel



Eric wrote:

>On Wednesday 18 February 2004 9:26 am, Brice Figureau wrote:
>Someone correct me if I'm way off base, but have you tried
>	scheduler=deadline
>To try the different kernel schedulers?
>Im not sure about the keyword, but do a quick google or search lkml and you'll 
>see the correct keyword and the different schedulers you can try.
>
>
>

It's probably a CPU scheduler related problem, so this
wouldn't do much.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PROBLEM: spurious temporary freeze, scheduler and preempt problem ?
  2004-02-19  0:27   ` Nick Piggin
@ 2004-02-19  9:41     ` Brice Figureau
  0 siblings, 0 replies; 4+ messages in thread
From: Brice Figureau @ 2004-02-19  9:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: Eric, Nick Piggin

Eric, Nick,

On Thu, 2004-02-19 at 01:27, Nick Piggin wrote:
> Eric wrote:
> 
> >On Wednesday 18 February 2004 9:26 am, Brice Figureau wrote:
> >Someone correct me if I'm way off base, but have you tried
> >	scheduler=deadline
> >To try the different kernel schedulers?
> >Im not sure about the keyword, but do a quick google or search lkml and you'll 
> >see the correct keyword and the different schedulers you can try.
>
> It's probably a CPU scheduler related problem, so this
> wouldn't do much.

Yes it is more a CPU scheduler problem, than an io one (although I
tested with elevator=deadline just to be sure ;-) and I had the same
problem).

--
Brice Figureau


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-02-19  9:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-18 15:26 PROBLEM: spurious temporary freeze, scheduler and preempt problem ? Brice Figureau
2004-02-18 18:12 ` Eric
2004-02-19  0:27   ` Nick Piggin
2004-02-19  9:41     ` Brice Figureau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox