public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* RE: System hang problem.
@ 2006-10-04  0:07 Manish Neema
  2006-10-04  0:49 ` Alan Cox
  2006-10-04  3:26 ` Willy Tarreau
  0 siblings, 2 replies; 9+ messages in thread
From: Manish Neema @ 2006-10-04  0:07 UTC (permalink / raw)
  To: Keith Mannthey; +Cc: linux-kernel

Thanks Keith for the response.

My explanation earlier is not clear. The "automount" process dying with
restrictive overcommit settings is not because of the OOM kill. It looks
like some bug with "automount" binary itself causing it to exit when it
could not service a new request.

"cd /remote/something" when the system is out of (allocate'able) memory
causes the below events (obtained from /var/log/messages)

Oct  3 13:35:32 gentoo036 automount[2060]: handle_packet_missing: fork:
Cannot allocate memory
Oct  3 13:35:34 gentoo036 automount[2060]: can't unmount /remote

And then the automount process for /remote mount disappears, which
should not happen.

Thanks anyways, I'll try to take it up with RedHat again...

-Manish 



^ permalink raw reply	[flat|nested] 9+ messages in thread
* RE: System hang problem.
@ 2006-10-04 14:24 Al Boldi
  0 siblings, 0 replies; 9+ messages in thread
From: Al Boldi @ 2006-10-04 14:24 UTC (permalink / raw)
  To: linux-kernel

Manish Neema wrote:
> > What you can often do, if you have one application using much memory,
> > is limiting *this application's* memory usage with ulimit. If the
> > application correctly handles malloc()==NULL, then at least your
> > system will behave stably.
>
> The problem is its different application, different user each time (a
> typical large R&D environment). /etc/security/limits.conf allows to set
> max resident set size. Is there a way to limit based on the total
> virtual size?

You mean like: ulimit -v [total VMsize/runqueue]

I suppose, that this could easily be dynamically calculated by the kernel, 
for a tremendously inhibiting OOM-killer effect.


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 9+ messages in thread
* RE: System hang problem.
@ 2006-10-04  5:44 Manish Neema
  0 siblings, 0 replies; 9+ messages in thread
From: Manish Neema @ 2006-10-04  5:44 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Keith Mannthey, linux-kernel

> What you can often do, if you have one application using much memory, 
> is limiting *this application's* memory usage with ulimit. If the 
> application correctly handles malloc()==NULL, then at least your 
> system will behave stably.

The problem is its different application, different user each time (a
typical large R&D environment). /etc/security/limits.conf allows to set
max resident set size. Is there a way to limit based on the total
virtual size? 

-Manish

^ permalink raw reply	[flat|nested] 9+ messages in thread
* System hang problem.
@ 2006-10-03 22:07 Manish Neema
  2006-10-03 23:35 ` Keith Mannthey
  2006-10-04  0:33 ` Alan Cox
  0 siblings, 2 replies; 9+ messages in thread
From: Manish Neema @ 2006-10-03 22:07 UTC (permalink / raw)
  To: linux-kernel

Sorry, I've lost my patience with RedHat so posting here....

We see this problem frequently on RHEL3.0 U5 and U7. System would
completely hang upon memory shortage. The only option left is
power-cycle (or 'sysrq + b'). System hang occurs with any of the below 3
overcommit settings:

   - default (heuristic) overcommit (overcommit_memory=0) 
   - no overcommit handling by kernel (overcommit_memory=1)
   - restrictive overcommit with ratio=100% (overcommit_memory=2;
overcommit_ratio=100)

RHEL3.0 U3 would generate an OOM kill "each and every time" it sensed
system hang but due to other bugs, we had to move away from it. RedHat
calls the timely (at least for us) invocation of OOM in U3 a buggy
implementation and the delayed OOM kill in U5 and U7 the right
implementation (which we rarely get to see resulting in at least 5
systems hanging daily!)

Changing overcommit to 2 (and ratio to any where from 1 to 99) would
result in certain OS processes (automount daemon for e.g.) getting
killed when all the allowed memory is committed. What is the point in
reserving some memory if a random root process would get killed leaving
the system in a totally unknown state?

Any suggestions on how we can prevent system-hang + not have automount
(and any other root process) die? 

TIA,
-Manish Neema

P.S. Sorry, we cannot move away from RHEL3.0 U7 for a while.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-10-04 14:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-04  0:07 System hang problem Manish Neema
2006-10-04  0:49 ` Alan Cox
2006-10-04  3:26 ` Willy Tarreau
2006-10-04  9:53   ` Jarek Poplawski
  -- strict thread matches above, loose matches on Subject: below --
2006-10-04 14:24 Al Boldi
2006-10-04  5:44 Manish Neema
2006-10-03 22:07 Manish Neema
2006-10-03 23:35 ` Keith Mannthey
2006-10-04  0:33 ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox