fork: Resource temporarily unavailable / cant start new threads

All of lore.kernel.org
 help / color / mirror / Atom feed

* fork: Resource temporarily unavailable / cant start new threads
@ 2008-05-20 18:26 mark
  2008-05-21 20:28 ` Randy Dunlap
  0 siblings, 1 reply; 14+ messages in thread
From: mark @ 2008-05-20 18:26 UTC (permalink / raw)
  To: linux-kernel

I upgraded to 2.6.25.3-18.fc9.x86_64 fedora core 9, now I get this
error when I try to login to the box, kill a pr start a python app, or
do anything on a regular basis.

fork: Resource temporarily unavailable

I have over 10GB RAM free, and zero swap spaced used. The box is a
dual quad core Intel Xeon 5405 with 16GB RAM.

There is no error message in /var/log/messages or dmesg ...
how do I identify the problem?
thanks!

uname -a
Linux XXX 2.6.25.3-18.fc9.x86_64 #1 SMP Tue May 13 04:54:47 EDT 2008
x86_64 x86_64 x86_64 GNU/Linux


free -m
            total       used       free     shared    buffers     cached
Mem:         16086       3189      12896          0         42        666
-/+ buffers/cache:       2481      13605
Swap:         1983          0       1983


have only 505 processes running
ps aux | wc -l
505


uptime
 11:24:15 up 39 min,  1 user,  load average: 3.54, 3.47, 2.87

ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 137216
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 32768
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fork: Resource temporarily unavailable / cant start new threads
  2008-05-20 18:26 fork: Resource temporarily unavailable / cant start new threads mark
@ 2008-05-21 20:28 ` Randy Dunlap
  2008-05-21 20:39   ` mark
  2008-05-21 20:39   ` Johannes Weiner
  0 siblings, 2 replies; 14+ messages in thread
From: Randy Dunlap @ 2008-05-21 20:28 UTC (permalink / raw)
  To: mark; +Cc: linux-kernel

On Tue, 20 May 2008 11:26:47 -0700 mark wrote:

> I upgraded to 2.6.25.3-18.fc9.x86_64 fedora core 9, now I get this
> error when I try to login to the box, kill a pr start a python app, or
> do anything on a regular basis.
> 
> fork: Resource temporarily unavailable
> 
> I have over 10GB RAM free, and zero swap spaced used. The box is a
> dual quad core Intel Xeon 5405 with 16GB RAM.
> 
> There is no error message in /var/log/messages or dmesg ...
> how do I identify the problem?
> thanks!
> 
> uname -a
> Linux XXX 2.6.25.3-18.fc9.x86_64 #1 SMP Tue May 13 04:54:47 EDT 2008
> x86_64 x86_64 x86_64 GNU/Linux
> 
> 
> free -m
>             total       used       free     shared    buffers     cached
> Mem:         16086       3189      12896          0         42        666
> -/+ buffers/cache:       2481      13605
> Swap:         1983          0       1983
> 
> 
> have only 505 processes running
> ps aux | wc -l
> 505
> 
> 
> uptime
>  11:24:15 up 39 min,  1 user,  load average: 3.54, 3.47, 2.87
> 
> ulimit -a
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 137216
> max locked memory       (kbytes, -l) 32
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 32768
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 10240
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 1024
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited

Hi,

The only place that fork() returns EAGAIN is for number of
processes being >= its limit.  Does this user already have >= 1024
processes?


---
~Randy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fork: Resource temporarily unavailable / cant start new threads
  2008-05-21 20:28 ` Randy Dunlap
@ 2008-05-21 20:39   ` mark
  2008-05-21 20:50     ` Randy Dunlap
  2008-05-21 20:39   ` Johannes Weiner
  1 sibling, 1 reply; 14+ messages in thread
From: mark @ 2008-05-21 20:39 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-kernel

On Wed, May 21, 2008 at 1:28 PM, Randy Dunlap <randy.dunlap@oracle.com> wrote:
> On Tue, 20 May 2008 11:26:47 -0700 mark wrote:
>> I upgraded to 2.6.25.3-18.fc9.x86_64 fedora core 9, now I get this
>> error when I try to login to the box, kill a pr start a python app, or
>> do anything on a regular basis.
>>
>> fork: Resource temporarily unavailable
>>
>> I have over 10GB RAM free, and zero swap spaced used. The box is a
>> dual quad core Intel Xeon 5405 with 16GB RAM.
>>
>> There is no error message in /var/log/messages or dmesg ...
>> how do I identify the problem?
>> thanks!
>>
>> uname -a
>> Linux XXX 2.6.25.3-18.fc9.x86_64 #1 SMP Tue May 13 04:54:47 EDT 2008
>> x86_64 x86_64 x86_64 GNU/Linux
>>
>>
>> free -m
>>             total       used       free     shared    buffers     cached
>> Mem:         16086       3189      12896          0         42        666
>> -/+ buffers/cache:       2481      13605
>> Swap:         1983          0       1983
>>
>>
>> have only 505 processes running
>> ps aux | wc -l
>> 505
>>
>>
>> uptime
>>  11:24:15 up 39 min,  1 user,  load average: 3.54, 3.47, 2.87
>>
>> ulimit -a
>> core file size          (blocks, -c) 0
>> data seg size           (kbytes, -d) unlimited
>> scheduling priority             (-e) 0
>> file size               (blocks, -f) unlimited
>> pending signals                 (-i) 137216
>> max locked memory       (kbytes, -l) 32
>> max memory size         (kbytes, -m) unlimited
>> open files                      (-n) 32768
>> pipe size            (512 bytes, -p) 8
>> POSIX message queues     (bytes, -q) 819200
>> real-time priority              (-r) 0
>> stack size              (kbytes, -s) 10240
>> cpu time               (seconds, -t) unlimited
>> max user processes              (-u) 1024
>> virtual memory          (kbytes, -v) unlimited
>> file locks                      (-x) unlimited
> The only place that fork() returns EAGAIN is for number of
> processes being >= its limit.  Does this user already have >= 1024
> processes?

No, it is around 400

ps ax | wc -l
417

I also I increased max process to unlimited, and I still get the error
randomly..

ulimit -u
unlimited

my webserver is now throwing this error:

setuid(500) failed (11: Resource temporarily unavailable)


cat /etc/passwd | grep mark
mark:x:500:500::/home/mark:/bin/bash

I also increased this, but still the same error
kernel.pid_max =  65536

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fork: Resource temporarily unavailable / cant start new threads
  2008-05-21 20:28 ` Randy Dunlap
  2008-05-21 20:39   ` mark
@ 2008-05-21 20:39   ` Johannes Weiner
  1 sibling, 0 replies; 14+ messages in thread
From: Johannes Weiner @ 2008-05-21 20:39 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: mark, linux-kernel

Hi,

Randy Dunlap <randy.dunlap@oracle.com> writes:

> On Tue, 20 May 2008 11:26:47 -0700 mark wrote:
>> have only 505 processes running
>> ps aux | wc -l
>> 505

[ quoting deleted for clarification ]

>> max user processes              (-u) 1024

> The only place that fork() returns EAGAIN is for number of
> processes being >= its limit.  Does this user already have >= 1024
> processes?

	Hannes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fork: Resource temporarily unavailable / cant start new threads
  2008-05-21 20:39   ` mark
@ 2008-05-21 20:50     ` Randy Dunlap
  2008-05-21 21:08       ` mark
  0 siblings, 1 reply; 14+ messages in thread
From: Randy Dunlap @ 2008-05-21 20:50 UTC (permalink / raw)
  To: mark; +Cc: linux-kernel

mark wrote:
> On Wed, May 21, 2008 at 1:28 PM, Randy Dunlap <randy.dunlap@oracle.com> wrote:
>> On Tue, 20 May 2008 11:26:47 -0700 mark wrote:
>>> I upgraded to 2.6.25.3-18.fc9.x86_64 fedora core 9, now I get this
>>> error when I try to login to the box, kill a pr start a python app, or
>>> do anything on a regular basis.
>>>
>>> fork: Resource temporarily unavailable
>>>
>>> I have over 10GB RAM free, and zero swap spaced used. The box is a
>>> dual quad core Intel Xeon 5405 with 16GB RAM.
>>>
>>> There is no error message in /var/log/messages or dmesg ...
>>> how do I identify the problem?
>>> thanks!
>>>
>>> uname -a
>>> Linux XXX 2.6.25.3-18.fc9.x86_64 #1 SMP Tue May 13 04:54:47 EDT 2008
>>> x86_64 x86_64 x86_64 GNU/Linux
>>>
>>>
>>> free -m
>>>             total       used       free     shared    buffers     cached
>>> Mem:         16086       3189      12896          0         42        666
>>> -/+ buffers/cache:       2481      13605
>>> Swap:         1983          0       1983
>>>
>>>
>>> have only 505 processes running
>>> ps aux | wc -l
>>> 505
>>>
>>>
>>> uptime
>>>  11:24:15 up 39 min,  1 user,  load average: 3.54, 3.47, 2.87
>>>
>>> ulimit -a
>>> core file size          (blocks, -c) 0
>>> data seg size           (kbytes, -d) unlimited
>>> scheduling priority             (-e) 0
>>> file size               (blocks, -f) unlimited
>>> pending signals                 (-i) 137216
>>> max locked memory       (kbytes, -l) 32
>>> max memory size         (kbytes, -m) unlimited
>>> open files                      (-n) 32768
>>> pipe size            (512 bytes, -p) 8
>>> POSIX message queues     (bytes, -q) 819200
>>> real-time priority              (-r) 0
>>> stack size              (kbytes, -s) 10240
>>> cpu time               (seconds, -t) unlimited
>>> max user processes              (-u) 1024
>>> virtual memory          (kbytes, -v) unlimited
>>> file locks                      (-x) unlimited
>> The only place that fork() returns EAGAIN is for number of
>> processes being >= its limit.  Does this user already have >= 1024
>> processes?
> 
> No, it is around 400

Well, my comment was wrong anyway.  There are several other tests just
below number of user processes that also return EAGAIN, like:

- total number of threads being too large
- error on grabbing a module reference count (?)
- error on grabbing a binfmt module reference


> ps ax | wc -l
> 417
> 
> I also I increased max process to unlimited, and I still get the error
> randomly..
> 
> ulimit -u
> unlimited
> 
> my webserver is now throwing this error:
> 
> setuid(500) failed (11: Resource temporarily unavailable)

That's all of the useful information??

> 
> cat /etc/passwd | grep mark
> mark:x:500:500::/home/mark:/bin/bash
> 
> I also increased this, but still the same error
> kernel.pid_max =  65536


-- 
~Randy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fork: Resource temporarily unavailable / cant start new threads
  2008-05-21 20:50     ` Randy Dunlap
@ 2008-05-21 21:08       ` mark
  2008-05-21 21:15         ` Jesper Juhl
  2008-05-21 21:32         ` Randy Dunlap
  0 siblings, 2 replies; 14+ messages in thread
From: mark @ 2008-05-21 21:08 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-kernel

On Wed, May 21, 2008 at 1:50 PM, Randy Dunlap <randy.dunlap@oracle.com> wrote:
> mark wrote:
>>
>> On Wed, May 21, 2008 at 1:28 PM, Randy Dunlap <randy.dunlap@oracle.com>
>> wrote:
>>>
>>> On Tue, 20 May 2008 11:26:47 -0700 mark wrote:
>>>>
>>>> I upgraded to 2.6.25.3-18.fc9.x86_64 fedora core 9, now I get this
>>>> error when I try to login to the box, kill a pr start a python app, or
>>>> do anything on a regular basis.
>>>>
>>>> fork: Resource temporarily unavailable
>>>>
>>>> I have over 10GB RAM free, and zero swap spaced used. The box is a
>>>> dual quad core Intel Xeon 5405 with 16GB RAM.
>>>>
>>>> There is no error message in /var/log/messages or dmesg ...
>>>> how do I identify the problem?
>>>> thanks!
>>>>
>>>> uname -a
>>>> Linux XXX 2.6.25.3-18.fc9.x86_64 #1 SMP Tue May 13 04:54:47 EDT 2008
>>>> x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>>
>>>> free -m
>>>>            total       used       free     shared    buffers     cached
>>>> Mem:         16086       3189      12896          0         42
>>>>  666
>>>> -/+ buffers/cache:       2481      13605
>>>> Swap:         1983          0       1983
>>>>
>>>>
>>>> have only 505 processes running
>>>> ps aux | wc -l
>>>> 505
>>>>
>>>>
>>>> uptime
>>>>  11:24:15 up 39 min,  1 user,  load average: 3.54, 3.47, 2.87
>>>>
>>>> ulimit -a
>>>> core file size          (blocks, -c) 0
>>>> data seg size           (kbytes, -d) unlimited
>>>> scheduling priority             (-e) 0
>>>> file size               (blocks, -f) unlimited
>>>> pending signals                 (-i) 137216
>>>> max locked memory       (kbytes, -l) 32
>>>> max memory size         (kbytes, -m) unlimited
>>>> open files                      (-n) 32768
>>>> pipe size            (512 bytes, -p) 8
>>>> POSIX message queues     (bytes, -q) 819200
>>>> real-time priority              (-r) 0
>>>> stack size              (kbytes, -s) 10240
>>>> cpu time               (seconds, -t) unlimited
>>>> max user processes              (-u) 1024
>>>> virtual memory          (kbytes, -v) unlimited
>>>> file locks                      (-x) unlimited
>>>
>>> The only place that fork() returns EAGAIN is for number of
>>> processes being >= its limit.  Does this user already have >= 1024
>>> processes?
>>
>> No, it is around 400
>
> Well, my comment was wrong anyway.  There are several other tests just
> below number of user processes that also return EAGAIN, like:
>
> - total number of threads being too large
> - error on grabbing a module reference count (?)
> - error on grabbing a binfmt module reference

as a user how do i identify what is wrong, and fix this? for total
number of threads -> is there anyway i can find out if this is causing
the problem? my system is running around 80 multi-threaded python web
apps.



>> my webserver is now throwing this error:
>>
>> setuid(500) failed (11: Resource temporarily unavailable)
>
> That's all of the useful information??

Yes. i get this error  when i restart the web server. if i kill all
other apps, and then start it again it starts fine.

this is the complete error message,
2008/05/21 08:02:19 [emerg] 30558#0: setuid(500) failed (11: Resource
temporarily unavailable)
2008/05/21 08:02:19 [alert] 30557#0: worker process 30558 exited with
fatal code 2 and can not be respawn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fork: Resource temporarily unavailable / cant start new threads
  2008-05-21 21:08       ` mark
@ 2008-05-21 21:15         ` Jesper Juhl
  2008-05-21 21:27           ` mark
  2008-05-21 21:32         ` Randy Dunlap
  1 sibling, 1 reply; 14+ messages in thread
From: Jesper Juhl @ 2008-05-21 21:15 UTC (permalink / raw)
  To: mark; +Cc: Randy Dunlap, linux-kernel

2008/5/21 mark <markkicks@gmail.com>:
<snip>
>>> my webserver is now throwing this error:
>>>
>>> setuid(500) failed (11: Resource temporarily unavailable)
>>
>> That's all of the useful information??
>
> Yes. i get this error  when i restart the web server. if i kill all
> other apps, and then start it again it starts fine.
>
> this is the complete error message,
> 2008/05/21 08:02:19 [emerg] 30558#0: setuid(500) failed (11: Resource
> temporarily unavailable)
> 2008/05/21 08:02:19 [alert] 30557#0: worker process 30558 exited with
> fatal code 2 and can not be respawn

What about if you run 'dmesg'? are there any clues in that output?
any kernel stack traces? error messages? warnings? anything out of the
ordinary?

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fork: Resource temporarily unavailable / cant start new threads
  2008-05-21 21:15         ` Jesper Juhl
@ 2008-05-21 21:27           ` mark
  0 siblings, 0 replies; 14+ messages in thread
From: mark @ 2008-05-21 21:27 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Randy Dunlap, linux-kernel

On Wed, May 21, 2008 at 2:15 PM, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> 2008/5/21 mark <markkicks@gmail.com>:
> <snip>
>>>> my webserver is now throwing this error:
>>>>
>>>> setuid(500) failed (11: Resource temporarily unavailable)
>>>
>>> That's all of the useful information??
>>
>> Yes. i get this error  when i restart the web server. if i kill all
>> other apps, and then start it again it starts fine.
>>
>> this is the complete error message,
>> 2008/05/21 08:02:19 [emerg] 30558#0: setuid(500) failed (11: Resource
>> temporarily unavailable)
>> 2008/05/21 08:02:19 [alert] 30557#0: worker process 30558 exited with
>> fatal code 2 and can not be respawn
>
> What about if you run 'dmesg'? are there any clues in that output?
> any kernel stack traces? error messages? warnings? anything out of the
> ordinary?
No.
There is no new message added after kernel boot messages

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fork: Resource temporarily unavailable / cant start new threads
  2008-05-21 21:08       ` mark
  2008-05-21 21:15         ` Jesper Juhl
@ 2008-05-21 21:32         ` Randy Dunlap
  2008-05-21 22:51           ` mark
  1 sibling, 1 reply; 14+ messages in thread
From: Randy Dunlap @ 2008-05-21 21:32 UTC (permalink / raw)
  To: mark; +Cc: linux-kernel

On Wed, 21 May 2008 14:08:53 -0700 mark wrote:

> On Wed, May 21, 2008 at 1:50 PM, Randy Dunlap <randy.dunlap@oracle.com> wrote:
> > mark wrote:
> >>
> >> On Wed, May 21, 2008 at 1:28 PM, Randy Dunlap <randy.dunlap@oracle.com>
> >> wrote:
> >>>
> >>> On Tue, 20 May 2008 11:26:47 -0700 mark wrote:
> >>>>
> >>>> I upgraded to 2.6.25.3-18.fc9.x86_64 fedora core 9, now I get this
> >>>> error when I try to login to the box, kill a pr start a python app, or
> >>>> do anything on a regular basis.
> >>>>
> >>>> fork: Resource temporarily unavailable
> >>>>
> >>>> I have over 10GB RAM free, and zero swap spaced used. The box is a
> >>>> dual quad core Intel Xeon 5405 with 16GB RAM.
> >>>>
> >>>> There is no error message in /var/log/messages or dmesg ...
> >>>> how do I identify the problem?
> >>>> thanks!
> >>>>
> >>>> uname -a
> >>>> Linux XXX 2.6.25.3-18.fc9.x86_64 #1 SMP Tue May 13 04:54:47 EDT 2008
> >>>> x86_64 x86_64 x86_64 GNU/Linux
> >>>>
> >>>>
> >>>> free -m
> >>>>            total       used       free     shared    buffers     cached
> >>>> Mem:         16086       3189      12896          0         42
> >>>>  666
> >>>> -/+ buffers/cache:       2481      13605
> >>>> Swap:         1983          0       1983
> >>>>
> >>>>
> >>>> have only 505 processes running
> >>>> ps aux | wc -l
> >>>> 505
> >>>>
> >>>>
> >>>> uptime
> >>>>  11:24:15 up 39 min,  1 user,  load average: 3.54, 3.47, 2.87
> >>>>
> >>>> ulimit -a
> >>>> core file size          (blocks, -c) 0
> >>>> data seg size           (kbytes, -d) unlimited
> >>>> scheduling priority             (-e) 0
> >>>> file size               (blocks, -f) unlimited
> >>>> pending signals                 (-i) 137216
> >>>> max locked memory       (kbytes, -l) 32
> >>>> max memory size         (kbytes, -m) unlimited
> >>>> open files                      (-n) 32768
> >>>> pipe size            (512 bytes, -p) 8
> >>>> POSIX message queues     (bytes, -q) 819200
> >>>> real-time priority              (-r) 0
> >>>> stack size              (kbytes, -s) 10240
> >>>> cpu time               (seconds, -t) unlimited
> >>>> max user processes              (-u) 1024
> >>>> virtual memory          (kbytes, -v) unlimited
> >>>> file locks                      (-x) unlimited
> >>>
> >>> The only place that fork() returns EAGAIN is for number of
> >>> processes being >= its limit.  Does this user already have >= 1024
> >>> processes?
> >>
> >> No, it is around 400
> >
> > Well, my comment was wrong anyway.  There are several other tests just
> > below number of user processes that also return EAGAIN, like:
> >
> > - total number of threads being too large

Total number of threads currently running is in /proc/loadavg:

> cat /proc/loadavg
1.56 0.58 0.27 2/203 28500

It's the number following the '/', e.g., 203 on my desktop system.

max_threads allowed is a sysctl, so you can tune it if needed.
It's in /proc/sys/kernel/threads-max:

> cat /proc/sys/kernel/threads-max
32624

I sort of doubt that one is the problem, but you can tell us.

> > - error on grabbing a module reference count (?)
> > - error on grabbing a binfmt module reference
> 
> as a user how do i identify what is wrong, and fix this? for total
> number of threads -> is there anyway i can find out if this is causing
> the problem? my system is running around 80 multi-threaded python web
> apps.

I can send you some debug patches that will print out the specific
problem area.  Do you want to do that?  Can you rebuild and install
a new kernel?


> >> my webserver is now throwing this error:
> >>
> >> setuid(500) failed (11: Resource temporarily unavailable)
> >
> > That's all of the useful information??
> 
> Yes. i get this error  when i restart the web server. if i kill all
> other apps, and then start it again it starts fine.
> 
> this is the complete error message,
> 2008/05/21 08:02:19 [emerg] 30558#0: setuid(500) failed (11: Resource
> temporarily unavailable)
> 2008/05/21 08:02:19 [alert] 30557#0: worker process 30558 exited with
> fatal code 2 and can not be respawn


---
~Randy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fork: Resource temporarily unavailable / cant start new threads
  2008-05-21 21:32         ` Randy Dunlap
@ 2008-05-21 22:51           ` mark
  2008-05-21 23:35             ` Randy Dunlap
  0 siblings, 1 reply; 14+ messages in thread
From: mark @ 2008-05-21 22:51 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-kernel

On Wed, May 21, 2008 at 2:32 PM, Randy Dunlap <randy.dunlap@oracle.com> wrote:
> On Wed, 21 May 2008 14:08:53 -0700 mark wrote:
>
>> On Wed, May 21, 2008 at 1:50 PM, Randy Dunlap <randy.dunlap@oracle.com> wrote:
>> > mark wrote:
>> >>
>> >> On Wed, May 21, 2008 at 1:28 PM, Randy Dunlap <randy.dunlap@oracle.com>
>> >> wrote:
>> >>>
>> >>> On Tue, 20 May 2008 11:26:47 -0700 mark wrote:
>> >>>>
>> >>>> I upgraded to 2.6.25.3-18.fc9.x86_64 fedora core 9, now I get this
>> >>>> error when I try to login to the box, kill a pr start a python app, or
>> >>>> do anything on a regular basis.
>> >>>>
>> >>>> fork: Resource temporarily unavailable
>> >>>>
>> >>>> I have over 10GB RAM free, and zero swap spaced used. The box is a
>> >>>> dual quad core Intel Xeon 5405 with 16GB RAM.
>> >>>>
>> >>>> There is no error message in /var/log/messages or dmesg ...
>> >>>> how do I identify the problem?
>> >>>> thanks!
>> >>>>
>> >>>> uname -a
>> >>>> Linux XXX 2.6.25.3-18.fc9.x86_64 #1 SMP Tue May 13 04:54:47 EDT 2008
>> >>>> x86_64 x86_64 x86_64 GNU/Linux
>> >>>>
>> >>>>
>> >>>> free -m
>> >>>>            total       used       free     shared    buffers     cached
>> >>>> Mem:         16086       3189      12896          0         42
>> >>>>  666
>> >>>> -/+ buffers/cache:       2481      13605
>> >>>> Swap:         1983          0       1983
>> >>>>
>> >>>>
>> >>>> have only 505 processes running
>> >>>> ps aux | wc -l
>> >>>> 505
>> >>>>
>> >>>>
>> >>>> uptime
>> >>>>  11:24:15 up 39 min,  1 user,  load average: 3.54, 3.47, 2.87
>> >>>>
>> >>>> ulimit -a
>> >>>> core file size          (blocks, -c) 0
>> >>>> data seg size           (kbytes, -d) unlimited
>> >>>> scheduling priority             (-e) 0
>> >>>> file size               (blocks, -f) unlimited
>> >>>> pending signals                 (-i) 137216
>> >>>> max locked memory       (kbytes, -l) 32
>> >>>> max memory size         (kbytes, -m) unlimited
>> >>>> open files                      (-n) 32768
>> >>>> pipe size            (512 bytes, -p) 8
>> >>>> POSIX message queues     (bytes, -q) 819200
>> >>>> real-time priority              (-r) 0
>> >>>> stack size              (kbytes, -s) 10240
>> >>>> cpu time               (seconds, -t) unlimited
>> >>>> max user processes              (-u) 1024
>> >>>> virtual memory          (kbytes, -v) unlimited
>> >>>> file locks                      (-x) unlimited
>> >>>
>> >>> The only place that fork() returns EAGAIN is for number of
>> >>> processes being >= its limit.  Does this user already have >= 1024
>> >>> processes?
>> >>
>> >> No, it is around 400
>> >
>> > Well, my comment was wrong anyway.  There are several other tests just
>> > below number of user processes that also return EAGAIN, like:
>> >
>> > - total number of threads being too large
>
> Total number of threads currently running is in /proc/loadavg:
>
>> cat /proc/loadavg
> 1.56 0.58 0.27 2/203 28500
>
> It's the number following the '/', e.g., 203 on my desktop system.
>
> max_threads allowed is a sysctl, so you can tune it if needed.
> It's in /proc/sys/kernel/threads-max:
>
>> cat /proc/sys/kernel/threads-max
> 32624
> I sort of doubt that one is the problem, but you can tell us.

cat /proc/loadavg
0.39 0.45 0.57 1/1412 12032
cat /proc/sys/kernel/threads-max
274432
you are right, i guess this is not the problem.


>> > - error on grabbing a module reference count (?)
>> > - error on grabbing a binfmt module reference
>>
>> as a user how do i identify what is wrong, and fix this? for total
>> number of threads -> is there anyway i can find out if this is causing
>> the problem? my system is running around 80 multi-threaded python web
>> apps.
>
> I can send you some debug patches that will print out the specific
> problem area.  Do you want to do that?  Can you rebuild and install
> a new kernel?
Is it possible to get this debug messages by turning on some flags?
If not yes, pl. send debug patches. its a live box and  I will try to do it!

This is my system / kernel info:
uname -a
Linux XXX 2.6.25.3-18.fc9.x86_64 #1 SMP Tue May 13 04:54:47 EDT 2008
x86_64 x86_64 x86_64 GNU/Linux

thanks a lot!!!!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fork: Resource temporarily unavailable / cant start new threads
  2008-05-21 22:51           ` mark
@ 2008-05-21 23:35             ` Randy Dunlap
  2008-05-22  0:09               ` mark
  0 siblings, 1 reply; 14+ messages in thread
From: Randy Dunlap @ 2008-05-21 23:35 UTC (permalink / raw)
  To: mark; +Cc: linux-kernel

On Wed, 21 May 2008 15:51:55 -0700 mark wrote:

> On Wed, May 21, 2008 at 2:32 PM, Randy Dunlap <randy.dunlap@oracle.com> wrote:
> > On Wed, 21 May 2008 14:08:53 -0700 mark wrote:
> >
> >> On Wed, May 21, 2008 at 1:50 PM, Randy Dunlap <randy.dunlap@oracle.com> wrote:
> >> > mark wrote:
> >> >>
> >> >> On Wed, May 21, 2008 at 1:28 PM, Randy Dunlap <randy.dunlap@oracle.com>
> >> >> wrote:
> >> >>>
> >> >>> On Tue, 20 May 2008 11:26:47 -0700 mark wrote:
> >> >>>>
> >> >>>> I upgraded to 2.6.25.3-18.fc9.x86_64 fedora core 9, now I get this
> >> >>>> error when I try to login to the box, kill a pr start a python app, or
> >> >>>> do anything on a regular basis.
> >> >>>>
> >> >>>> fork: Resource temporarily unavailable

[snip]

> >> >>> The only place that fork() returns EAGAIN is for number of
> >> >>> processes being >= its limit.  Does this user already have >= 1024
> >> >>> processes?
> >> >>
> >> >> No, it is around 400
> >> >
> >> > Well, my comment was wrong anyway.  There are several other tests just
> >> > below number of user processes that also return EAGAIN, like:
> >> >
> >> > - total number of threads being too large
> >
> > Total number of threads currently running is in /proc/loadavg:
> >
> >> cat /proc/loadavg
> > 1.56 0.58 0.27 2/203 28500
> >
> > It's the number following the '/', e.g., 203 on my desktop system.
> >
> > max_threads allowed is a sysctl, so you can tune it if needed.
> > It's in /proc/sys/kernel/threads-max:
> >
> >> cat /proc/sys/kernel/threads-max
> > 32624
> > I sort of doubt that one is the problem, but you can tell us.
> 
> cat /proc/loadavg
> 0.39 0.45 0.57 1/1412 12032
> cat /proc/sys/kernel/threads-max
> 274432
> you are right, i guess this is not the problem.
> 
> 
> >> > - error on grabbing a module reference count (?)
> >> > - error on grabbing a binfmt module reference
> >>
> >> as a user how do i identify what is wrong, and fix this? for total
> >> number of threads -> is there anyway i can find out if this is causing
> >> the problem? my system is running around 80 multi-threaded python web
> >> apps.
> >
> > I can send you some debug patches that will print out the specific
> > problem area.  Do you want to do that?  Can you rebuild and install
> > a new kernel?
> Is it possible to get this debug messages by turning on some flags?
> If not yes, pl. send debug patches. its a live box and  I will try to do it!
> 
> This is my system / kernel info:
> uname -a
> Linux XXX 2.6.25.3-18.fc9.x86_64 #1 SMP Tue May 13 04:54:47 EDT 2008
> x86_64 x86_64 x86_64 GNU/Linux

I made a small patch to a vanilla kernel.org 2.6.25.3 kernel tree.
Hopefully it applies cleanly to that fc9 kernel source, but check/verify
that first before going any further.

After building and booting with this patch, there will be kernel
messages whenever fork's "copy_process" function fails with -EAGAIN (-11),
which is reported to userspace as errno = 11 (Resource temporarily
unavailable).  Hopefully this will identify which test is failing,
but there's a chance that something else is going on and that this
patch does not find the problem.

Anyway, good luck and please report back on it.

---

---
 kernel/fork.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

--- linux-2.6.25.3.orig/kernel/fork.c
+++ linux-2.6.25.3/kernel/fork.c
@@ -1049,8 +1049,10 @@ static struct task_struct *copy_process(
 	if (atomic_read(&p->user->processes) >=
 			p->signal->rlim[RLIMIT_NPROC].rlim_cur) {
 		if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_RESOURCE) &&
-		    p->user != current->nsproxy->user_ns->root_user)
+		    p->user != current->nsproxy->user_ns->root_user) {
+			printk(KERN_INFO "%s: error on #processes\n", __func__);
 			goto bad_fork_free;
+		}
 	}
 
 	atomic_inc(&p->user->__count);
@@ -1062,14 +1064,20 @@ static struct task_struct *copy_process(
 	 * triggers too late. This doesn't hurt, the check is only there
 	 * to stop root fork bombs.
 	 */
-	if (nr_threads >= max_threads)
+	if (nr_threads >= max_threads) {
+		printk(KERN_INFO "%s: error on #threads\n", __func__);
 		goto bad_fork_cleanup_count;
+	}
 
-	if (!try_module_get(task_thread_info(p)->exec_domain->module))
+	if (!try_module_get(task_thread_info(p)->exec_domain->module)) {
+		printk(KERN_INFO "%s: error on exec_domain->module\n", __func__);
 		goto bad_fork_cleanup_count;
+	}
 
-	if (p->binfmt && !try_module_get(p->binfmt->module))
+	if (p->binfmt && !try_module_get(p->binfmt->module)) {
+		printk(KERN_INFO "%s: error on binfmt->module\n", __func__);
 		goto bad_fork_cleanup_put_domain;
+	}
 
 	p->did_exec = 0;
 	delayacct_tsk_init(p);	/* Must remain after dup_task_struct() */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fork: Resource temporarily unavailable / cant start new threads
  2008-05-21 23:35             ` Randy Dunlap
@ 2008-05-22  0:09               ` mark
  2008-05-22  0:29                 ` Randy Dunlap
  2008-05-22  7:11                 ` するくめ
  0 siblings, 2 replies; 14+ messages in thread
From: mark @ 2008-05-22  0:09 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-kernel

On Wed, May 21, 2008 at 4:35 PM, Randy Dunlap <randy.dunlap@oracle.com> wrote:
>> > I can send you some debug patches that will print out the specific
>> > problem area.  Do you want to do that?  Can you rebuild and install
>> > a new kernel?
>> Is it possible to get this debug messages by turning on some flags?
>> If not yes, pl. send debug patches. its a live box and  I will try to do it!
>>
>> This is my system / kernel info:
>> uname -a
>> Linux XXX 2.6.25.3-18.fc9.x86_64 #1 SMP Tue May 13 04:54:47 EDT 2008
>> x86_64 x86_64 x86_64 GNU/Linux
>
> I made a small patch to a vanilla kernel.org 2.6.25.3 kernel tree.
> Hopefully it applies cleanly to that fc9 kernel source, but check/verify
> that first before going any further.

Thanks a lot for the patch,
This is kind of weird..  but there is no file  kernel/fork.c

[mark@localhost 2.6.25.3-18.fc9.i686]$ find . -iname '*fork*' -print
[mark@localhost 2.6.25.3-18.fc9.i686]$ rpm -ql
[mark@localhost 2.6.25.3-18.fc9.i686]$ rpm -ql kernel-devel | grep fork
[mark@localhost 2.6.25.3-18.fc9.i686]$ pwd
/usr/src/kernels/2.6.25.3-18.fc9.i686

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fork: Resource temporarily unavailable / cant start new threads
  2008-05-22  0:09               ` mark
@ 2008-05-22  0:29                 ` Randy Dunlap
  2008-05-22  7:11                 ` するくめ
  1 sibling, 0 replies; 14+ messages in thread
From: Randy Dunlap @ 2008-05-22  0:29 UTC (permalink / raw)
  To: mark; +Cc: linux-kernel

mark wrote:
> On Wed, May 21, 2008 at 4:35 PM, Randy Dunlap <randy.dunlap@oracle.com> wrote:
>>>> I can send you some debug patches that will print out the specific
>>>> problem area.  Do you want to do that?  Can you rebuild and install
>>>> a new kernel?
>>> Is it possible to get this debug messages by turning on some flags?
>>> If not yes, pl. send debug patches. its a live box and  I will try to do it!
>>>
>>> This is my system / kernel info:
>>> uname -a
>>> Linux XXX 2.6.25.3-18.fc9.x86_64 #1 SMP Tue May 13 04:54:47 EDT 2008
>>> x86_64 x86_64 x86_64 GNU/Linux
>> I made a small patch to a vanilla kernel.org 2.6.25.3 kernel tree.
>> Hopefully it applies cleanly to that fc9 kernel source, but check/verify
>> that first before going any further.
> 
> Thanks a lot for the patch,
> This is kind of weird..  but there is no file  kernel/fork.c
> 
> [mark@localhost 2.6.25.3-18.fc9.i686]$ find . -iname '*fork*' -print
> [mark@localhost 2.6.25.3-18.fc9.i686]$ rpm -ql
> [mark@localhost 2.6.25.3-18.fc9.i686]$ rpm -ql kernel-devel | grep fork
> [mark@localhost 2.6.25.3-18.fc9.i686]$ pwd
> /usr/src/kernels/2.6.25.3-18.fc9.i686

That's not a kernel source tree.

I'm no expert on fc nor on src.rpm's, but I think that you need to get
the fc kernel-2.6.25.3-18.src.rpm file (or something like that).

Or use a plain vanilla kernel.org 2.6.25.3 kernel tree.

-- 
~Randy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fork: Resource temporarily unavailable / cant start new threads
  2008-05-22  0:09               ` mark
  2008-05-22  0:29                 ` Randy Dunlap
@ 2008-05-22  7:11                 ` するくめ
  1 sibling, 0 replies; 14+ messages in thread
From: するくめ @ 2008-05-22  7:11 UTC (permalink / raw)
  To: linux-kernel

Take a look at this guide to install the kernel source on fedora 9

http://www.mjmwired.net/resources/mjm-fedora-f9.html#kernelsrc

2008/5/22 mark <markkicks@gmail.com>:
> On Wed, May 21, 2008 at 4:35 PM, Randy Dunlap <randy.dunlap@oracle.com> wrote:
>>> > I can send you some debug patches that will print out the specific
>>> > problem area.  Do you want to do that?  Can you rebuild and install
>>> > a new kernel?
>>> Is it possible to get this debug messages by turning on some flags?
>>> If not yes, pl. send debug patches. its a live box and  I will try to do it!
>>>
>>> This is my system / kernel info:
>>> uname -a
>>> Linux XXX 2.6.25.3-18.fc9.x86_64 #1 SMP Tue May 13 04:54:47 EDT 2008
>>> x86_64 x86_64 x86_64 GNU/Linux
>>
>> I made a small patch to a vanilla kernel.org 2.6.25.3 kernel tree.
>> Hopefully it applies cleanly to that fc9 kernel source, but check/verify
>> that first before going any further.
>
> Thanks a lot for the patch,
> This is kind of weird..  but there is no file  kernel/fork.c
>
> [mark@localhost 2.6.25.3-18.fc9.i686]$ find . -iname '*fork*' -print
> [mark@localhost 2.6.25.3-18.fc9.i686]$ rpm -ql
> [mark@localhost 2.6.25.3-18.fc9.i686]$ rpm -ql kernel-devel | grep fork
> [mark@localhost 2.6.25.3-18.fc9.i686]$ pwd
> /usr/src/kernels/2.6.25.3-18.fc9.i686
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



-- 
するくめ

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-05-22  7:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-20 18:26 fork: Resource temporarily unavailable / cant start new threads mark
2008-05-21 20:28 ` Randy Dunlap
2008-05-21 20:39   ` mark
2008-05-21 20:50     ` Randy Dunlap
2008-05-21 21:08       ` mark
2008-05-21 21:15         ` Jesper Juhl
2008-05-21 21:27           ` mark
2008-05-21 21:32         ` Randy Dunlap
2008-05-21 22:51           ` mark
2008-05-21 23:35             ` Randy Dunlap
2008-05-22  0:09               ` mark
2008-05-22  0:29                 ` Randy Dunlap
2008-05-22  7:11                 ` するくめ
2008-05-21 20:39   ` Johannes Weiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.