* tmpfs and the OOM killer
@ 2007-04-11 5:23 Pedro
2007-04-11 19:48 ` Willy Tarreau
0 siblings, 1 reply; 17+ messages in thread
From: Pedro @ 2007-04-11 5:23 UTC (permalink / raw)
To: linux-kernel
After suffering some days from a not|mis configured tmpfs,
As the OOM killer is not Posix,
Better than to kill processes would be to resize tmpfs, to use tmpfs empty
space.
I'm using kernel 2.6.20.4. If someone ask I'll send a test application.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: tmpfs and the OOM killer
2007-04-11 5:23 Pedro
@ 2007-04-11 19:48 ` Willy Tarreau
2007-04-11 21:05 ` Mouawad, Tony
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Willy Tarreau @ 2007-04-11 19:48 UTC (permalink / raw)
To: Pedro; +Cc: linux-kernel
On Wed, Apr 11, 2007 at 02:23:31AM -0300, Pedro wrote:
> After suffering some days from a not|mis configured tmpfs,
>
> As the OOM killer is not Posix,
>
> Better than to kill processes would be to resize tmpfs, to use tmpfs empty
> space.
Will not work, because tmpfs does not use any memory for unused space. If
you don't believe me, simply create a large file on your tmpfs, then check
free memory, then remove the file and check free memory again.
So your problem is not caused by the empty space on tmpfs, but either by
too much space used on tmpfs or by your application using too much memory.
> I'm using kernel 2.6.20.4. If someone ask I'll send a test application.
Not needed, the one-liner "main(){while(malloc(4096));}" is enough to
trigger an OOM.
If you cannot control your application's memory usage, you'll have to finely
tune the overcommit_ratio.
Regards,
Willy
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: tmpfs and the OOM killer
2007-04-11 19:48 ` Willy Tarreau
@ 2007-04-11 21:05 ` Mouawad, Tony
2007-04-11 22:27 ` Pedro
2007-04-12 8:13 ` Jan Engelhardt
2 siblings, 0 replies; 17+ messages in thread
From: Mouawad, Tony @ 2007-04-11 21:05 UTC (permalink / raw)
To: Willy Tarreau, Pedro; +Cc: linux-kernel
Can someone describe the process of finding the best value to tune the
overcommit_ratio to?
-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Willy Tarreau
Sent: Wednesday, April 11, 2007 3:48 PM
To: Pedro
Cc: linux-kernel@vger.kernel.org
Subject: Re: tmpfs and the OOM killer
On Wed, Apr 11, 2007 at 02:23:31AM -0300, Pedro wrote:
> After suffering some days from a not|mis configured tmpfs,
>
> As the OOM killer is not Posix,
>
> Better than to kill processes would be to resize tmpfs, to use tmpfs
empty
> space.
Will not work, because tmpfs does not use any memory for unused space.
If
you don't believe me, simply create a large file on your tmpfs, then
check
free memory, then remove the file and check free memory again.
So your problem is not caused by the empty space on tmpfs, but either by
too much space used on tmpfs or by your application using too much
memory.
> I'm using kernel 2.6.20.4. If someone ask I'll send a test
application.
Not needed, the one-liner "main(){while(malloc(4096));}" is enough to
trigger an OOM.
If you cannot control your application's memory usage, you'll have to
finely
tune the overcommit_ratio.
Regards,
Willy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: tmpfs and the OOM killer
2007-04-11 19:48 ` Willy Tarreau
2007-04-11 21:05 ` Mouawad, Tony
@ 2007-04-11 22:27 ` Pedro
2007-04-11 22:39 ` Alan Cox
2007-04-12 8:13 ` Jan Engelhardt
2 siblings, 1 reply; 17+ messages in thread
From: Pedro @ 2007-04-11 22:27 UTC (permalink / raw)
To: linux-kernel
On Wednesday 11 April 2007 16:48, Willy Tarreau wrote:
> On Wed, Apr 11, 2007 at 02:23:31AM -0300, Pedro wrote:
> > After suffering some days from a not|mis configured tmpfs,
> >
> > As the OOM killer is not Posix,
> >
> > Better than to kill processes would be to resize tmpfs, to use tmpfs
> > empty space.
>
> Will not work, because tmpfs does not use any memory for unused space. If
> you don't believe me, simply create a large file on your tmpfs, then check
> free memory, then remove the file and check free memory again.
>
> So your problem is not caused by the empty space on tmpfs, but either by
> too much space used on tmpfs or by your application using too much memory.
>
...
>
> If you cannot control your application's memory usage, you'll have to
> finely tune the overcommit_ratio.
>
> Regards,
> Willy
You are right.
But now I have two questions:
1) Why is tmpfs total space fixed if at the check moment does not exist
sufficient memory?
2) How should an application be written to not be killed by OOM?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: tmpfs and the OOM killer
2007-04-11 22:27 ` Pedro
@ 2007-04-11 22:39 ` Alan Cox
2007-04-12 5:19 ` Pedro
0 siblings, 1 reply; 17+ messages in thread
From: Alan Cox @ 2007-04-11 22:39 UTC (permalink / raw)
To: Pedro; +Cc: linux-kernel
> 2) How should an application be written to not be killed by OOM?
OOM isn't an application matter. The kernel has to choose between
allowing overcommit on the basis it might run out of memory and have to
kill stuff, or that it won't in which case an applicatio which correctly
handles malloc() and similar failures will not be killed (unless it is
out of space on a stack grow which is a C language flaw as you can't
catch that event in C)
It's configured by /proc/sys/vm/overcommit_memory
0 - try and spot obviously dumb allocations
1 - anything goes
2 - strictly control resource commit
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: tmpfs and the OOM killer
@ 2007-04-12 5:04 Al Boldi
2007-04-12 5:39 ` Pedro
0 siblings, 1 reply; 17+ messages in thread
From: Al Boldi @ 2007-04-12 5:04 UTC (permalink / raw)
To: linux-kernel
Pedro wrote:
> On Wednesday 11 April 2007 16:48, Willy Tarreau wrote:
> > On Wed, Apr 11, 2007 at 02:23:31AM -0300, Pedro wrote:
> > >
> > > As the OOM killer is not Posix,
> >
> > If you cannot control your application's memory usage, you'll have to
> > finely tune the overcommit_ratio.
>
> 2) How should an application be written to not be killed by OOM?
Try this:
# echo -17 > /proc/<pid>/oom_adj
Or this:
# echo 2 > /proc/sys/vm/overcommit_memory
# echo 95 > /proc/sys/vm/overcommit_ratio
Or this:
# ulimit -v [max vm]
Thanks, and good luck with the OOM killer!
--
Al
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: tmpfs and the OOM killer
2007-04-11 22:39 ` Alan Cox
@ 2007-04-12 5:19 ` Pedro
2007-04-12 9:42 ` Alan Cox
2007-04-12 11:25 ` Theodore Tso
0 siblings, 2 replies; 17+ messages in thread
From: Pedro @ 2007-04-12 5:19 UTC (permalink / raw)
To: linux-kernel
On Wednesday 11 April 2007 19:39, Alan Cox wrote:
> > 2) How should an application be written to not be killed by OOM?
>
> OOM isn't an application matter. The kernel has to choose between
> allowing overcommit on the basis it might run out of memory and have to
> kill stuff, or that it won't in which case an applicatio which correctly
> handles malloc() and similar failures will not be killed (unless it is
> out of space on a stack grow which is a C language flaw as you can't
> catch that event in C)
>
> It's configured by /proc/sys/vm/overcommit_memory
>
> 0 - try and spot obviously dumb allocations
> 1 - anything goes
> 2 - strictly control resource commit
I deduce that a fail-safe application must scanf overcommit_memory, warn
the user and waitpid.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: tmpfs and the OOM killer
2007-04-12 5:04 tmpfs and the OOM killer Al Boldi
@ 2007-04-12 5:39 ` Pedro
0 siblings, 0 replies; 17+ messages in thread
From: Pedro @ 2007-04-12 5:39 UTC (permalink / raw)
To: Al Boldi; +Cc: linux-kernel
On Thursday 12 April 2007 02:04, Al Boldi wrote:
> > Pedro wrote:
> > 2) How should an application be written to not be killed by OOM?
>
> Try this:
>
> # echo -17 > /proc/<pid>/oom_adj
I should know that to run a fail-safe application is a superuser privilege.
Sorry from wasting your time.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: tmpfs and the OOM killer
2007-04-11 19:48 ` Willy Tarreau
2007-04-11 21:05 ` Mouawad, Tony
2007-04-11 22:27 ` Pedro
@ 2007-04-12 8:13 ` Jan Engelhardt
2007-04-12 8:19 ` Willy Tarreau
2 siblings, 1 reply; 17+ messages in thread
From: Jan Engelhardt @ 2007-04-12 8:13 UTC (permalink / raw)
To: Willy Tarreau; +Cc: Pedro, linux-kernel
On Apr 11 2007 21:48, Willy Tarreau wrote:
>On Wed, Apr 11, 2007 at 02:23:31AM -0300, Pedro wrote:
>> After suffering some days from a not|mis configured tmpfs,
>>
>> As the OOM killer is not Posix,
>>
>> Better than to kill processes would be to resize tmpfs, to use tmpfs empty
>> space.
>
>Will not work, because tmpfs does not use any memory for unused space. If
>you don't believe me, simply create a large file on your tmpfs, then check
>free memory, then remove the file and check free memory again.
>
>So your problem is not caused by the empty space on tmpfs, but either by
>too much space used on tmpfs or by your application using too much memory.
>
>> I'm using kernel 2.6.20.4. If someone ask I'll send a test application.
>
>Not needed, the one-liner "main(){while(malloc(4096));}" is enough to
>trigger an OOM.
No, that won't do anything, malloc happily returns NULL after a few seconds.
int main()
{
while(1) {
char *p = malloc(4096);
*p = 1;
}
}
This is more likely to trigger OOM, because it actually dirties the page.
Jan
--
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: tmpfs and the OOM killer
2007-04-12 8:13 ` Jan Engelhardt
@ 2007-04-12 8:19 ` Willy Tarreau
0 siblings, 0 replies; 17+ messages in thread
From: Willy Tarreau @ 2007-04-12 8:19 UTC (permalink / raw)
To: Jan Engelhardt; +Cc: Pedro, linux-kernel
On Thu, Apr 12, 2007 at 10:13:05AM +0200, Jan Engelhardt wrote:
>
> On Apr 11 2007 21:48, Willy Tarreau wrote:
> >On Wed, Apr 11, 2007 at 02:23:31AM -0300, Pedro wrote:
> >> After suffering some days from a not|mis configured tmpfs,
> >>
> >> As the OOM killer is not Posix,
> >>
> >> Better than to kill processes would be to resize tmpfs, to use tmpfs empty
> >> space.
> >
> >Will not work, because tmpfs does not use any memory for unused space. If
> >you don't believe me, simply create a large file on your tmpfs, then check
> >free memory, then remove the file and check free memory again.
> >
> >So your problem is not caused by the empty space on tmpfs, but either by
> >too much space used on tmpfs or by your application using too much memory.
> >
> >> I'm using kernel 2.6.20.4. If someone ask I'll send a test application.
> >
> >Not needed, the one-liner "main(){while(malloc(4096));}" is enough to
> >trigger an OOM.
>
> No, that won't do anything, malloc happily returns NULL after a few seconds.
>
> int main()
> {
> while(1) {
> char *p = malloc(4096);
> *p = 1;
> }
> }
>
> This is more likely to trigger OOM, because it actually dirties the page.
Yes, you're right, and that's indeed what my "freemem" program does ;-)
Willy
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: tmpfs and the OOM killer
2007-04-12 5:19 ` Pedro
@ 2007-04-12 9:42 ` Alan Cox
2007-04-12 15:08 ` Pedro
2007-04-12 11:25 ` Theodore Tso
1 sibling, 1 reply; 17+ messages in thread
From: Alan Cox @ 2007-04-12 9:42 UTC (permalink / raw)
To: Pedro; +Cc: linux-kernel
> > 0 - try and spot obviously dumb allocations
> > 1 - anything goes
> > 2 - strictly control resource commit
>
> I deduce that a fail-safe application must scanf overcommit_memory, warn
> the user and waitpid.
If you are building a fail safe system you need to look a bit beyond out
of memory handling settings - power supplies, failover, fault tolerance
requirements, error detection (eg ECC ram), raid arrays over
non-electrically conductive links etc.
Alan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: tmpfs and the OOM killer
2007-04-12 5:19 ` Pedro
2007-04-12 9:42 ` Alan Cox
@ 2007-04-12 11:25 ` Theodore Tso
2007-04-12 14:35 ` Pedro
1 sibling, 1 reply; 17+ messages in thread
From: Theodore Tso @ 2007-04-12 11:25 UTC (permalink / raw)
To: Pedro; +Cc: linux-kernel
On Thu, Apr 12, 2007 at 02:19:02AM -0300, Pedro wrote:
> > OOM isn't an application matter. The kernel has to choose between
> > allowing overcommit on the basis it might run out of memory and have to
> > kill stuff, or that it won't in which case an applicatio which correctly
> > handles malloc() and similar failures will not be killed (unless it is
> > out of space on a stack grow which is a C language flaw as you can't
> > catch that event in C)
> >
> > It's configured by /proc/sys/vm/overcommit_memory
> >
> > 0 - try and spot obviously dumb allocations
> > 1 - anything goes
> > 2 - strictly control resource commit
>
> I deduce that a fail-safe application must scanf overcommit_memory, warn
> the user and waitpid.
If a fail-safe applicaion is running on a system which is that close
to the edge in terms of available physical memory and swap, it's not
likely going to be in deep trouble anyway. Even if you disable the
OOM killer, now random malloc()'s will start returning NULL because
your system doesn't have enough memory. Do you have intelligent error
handling and recovery mechanisms for every single malloc() failure?
Also, the machine will likely be thrashing so badly that any service
level performance guarantees that the application might have will
probably be totally trashed.
- Ted
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: tmpfs and the OOM killer
2007-04-12 11:25 ` Theodore Tso
@ 2007-04-12 14:35 ` Pedro
2007-04-12 20:31 ` Willy Tarreau
0 siblings, 1 reply; 17+ messages in thread
From: Pedro @ 2007-04-12 14:35 UTC (permalink / raw)
To: linux-kernel
On Thursday 12 April 2007 08:25, Theodore Tso wrote:
> likely going to be in deep trouble anyway. Even if you disable the
> OOM killer, now random malloc()'s will start returning NULL because
> your system doesn't have enough memory. Do you have intelligent error
> handling and recovery mechanisms for every single malloc() failure?
When malloc return NULL, the process may tell the user ENOMEM.
When OOM kill the process, the user claim the program sometimes die.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: tmpfs and the OOM killer
2007-04-12 9:42 ` Alan Cox
@ 2007-04-12 15:08 ` Pedro
0 siblings, 0 replies; 17+ messages in thread
From: Pedro @ 2007-04-12 15:08 UTC (permalink / raw)
To: linux-kernel
I wrote:
> > I deduce that a fail-safe application must scanf overcommit_memory,
> > warn the user and waitpid.
Alan Cox wrote:
> If you are building a fail safe system you need to look a bit beyond out
> of memory handling settings - power supplies, failover, fault tolerance
> requirements, error detection (eg ECC ram), raid arrays over
> non-electrically conductive links etc.
If you lived here, you'ld be worried about seismic-electric impact on your
HDD and data ;) So SIGKILL is not the hardest problem to be solved.
Maybe it is time to write an avoidoomd.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: tmpfs and the OOM killer
2007-04-12 14:35 ` Pedro
@ 2007-04-12 20:31 ` Willy Tarreau
2007-04-12 20:56 ` Mouawad, Tony
0 siblings, 1 reply; 17+ messages in thread
From: Willy Tarreau @ 2007-04-12 20:31 UTC (permalink / raw)
To: Pedro; +Cc: linux-kernel
On Thu, Apr 12, 2007 at 11:35:32AM -0300, Pedro wrote:
> On Thursday 12 April 2007 08:25, Theodore Tso wrote:
> > likely going to be in deep trouble anyway. Even if you disable the
> > OOM killer, now random malloc()'s will start returning NULL because
> > your system doesn't have enough memory. Do you have intelligent error
> > handling and recovery mechanisms for every single malloc() failure?
>
> When malloc return NULL, the process may tell the user ENOMEM.
> When OOM kill the process, the user claim the program sometimes die.
Then use overcommit=2. The default overcommit mode is a convenience provided
to allow poorly designed applications run even when they pretend they need
gigs of RAM when they only use a few tens of megs.
If your application correctly handles malloc(), simply switch overcommit
to 2 and let the system refuse to allocate memory when none is available,
then your application will be aware of this by the NULL result to malloc()
calls. It is a normal behaviour.
I do have appliances which run perfectly controlled software with
overcommit_mode=2 and overcommit_ratio around 70% and without any
swap, and they work like a charm. It just requires some finer grained
tuning on your side. I don't see what the problem is here. You know
the app, you know how much RAM you want to allocate to it, you know
how much you want to keep free. Then say this to the system.
BTW, ulimit -v is your friend here too, and does not require to be root.
Regards,
Willy
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: tmpfs and the OOM killer
2007-04-12 20:31 ` Willy Tarreau
@ 2007-04-12 20:56 ` Mouawad, Tony
2007-04-12 21:09 ` Willy Tarreau
0 siblings, 1 reply; 17+ messages in thread
From: Mouawad, Tony @ 2007-04-12 20:56 UTC (permalink / raw)
To: Willy Tarreau, Pedro; +Cc: linux-kernel
I have noticed that with overcommit_memory=2 and overcommit_ratio=100,
my system cannot leverage as much ram as it could if it was configured
for overcommit_memory=0.
Is this because when overcommit_memory=2, anything that mallocs memory
but doesn't touch that memory is counted as used memory? I see a value
in /proc/meminfo called Commited_AS: and it seems to reflect what has
been malloced in the system but not necessarily touched. Is this true?
Cheers,
Tony
-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Willy Tarreau
Sent: Thursday, April 12, 2007 4:32 PM
To: Pedro
Cc: linux-kernel@vger.kernel.org
Subject: Re: tmpfs and the OOM killer
On Thu, Apr 12, 2007 at 11:35:32AM -0300, Pedro wrote:
> On Thursday 12 April 2007 08:25, Theodore Tso wrote:
> > likely going to be in deep trouble anyway. Even if you disable the
> > OOM killer, now random malloc()'s will start returning NULL because
> > your system doesn't have enough memory. Do you have intelligent
error
> > handling and recovery mechanisms for every single malloc() failure?
>
> When malloc return NULL, the process may tell the user ENOMEM.
> When OOM kill the process, the user claim the program sometimes die.
Then use overcommit=2. The default overcommit mode is a convenience
provided
to allow poorly designed applications run even when they pretend they
need
gigs of RAM when they only use a few tens of megs.
If your application correctly handles malloc(), simply switch overcommit
to 2 and let the system refuse to allocate memory when none is
available,
then your application will be aware of this by the NULL result to
malloc()
calls. It is a normal behaviour.
I do have appliances which run perfectly controlled software with
overcommit_mode=2 and overcommit_ratio around 70% and without any
swap, and they work like a charm. It just requires some finer grained
tuning on your side. I don't see what the problem is here. You know
the app, you know how much RAM you want to allocate to it, you know
how much you want to keep free. Then say this to the system.
BTW, ulimit -v is your friend here too, and does not require to be root.
Regards,
Willy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: tmpfs and the OOM killer
2007-04-12 20:56 ` Mouawad, Tony
@ 2007-04-12 21:09 ` Willy Tarreau
0 siblings, 0 replies; 17+ messages in thread
From: Willy Tarreau @ 2007-04-12 21:09 UTC (permalink / raw)
To: Mouawad, Tony; +Cc: Pedro, linux-kernel
On Thu, Apr 12, 2007 at 04:56:24PM -0400, Mouawad, Tony wrote:
> I have noticed that with overcommit_memory=2 and overcommit_ratio=100,
> my system cannot leverage as much ram as it could if it was configured
> for overcommit_memory=0.
>
> Is this because when overcommit_memory=2, anything that mallocs memory
> but doesn't touch that memory is counted as used memory?
Most probably, yes. This proves that your system may endup doing OOM
in mode 0 if one of your applications suddenly decided to use all the
memory it has allocated.
> I see a value
> in /proc/meminfo called Commited_AS: and it seems to reflect what has
> been malloced in the system but not necessarily touched. Is this true?
Yes (at least I do think so).
Regards,
Willy
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-04-12 21:09 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-12 5:04 tmpfs and the OOM killer Al Boldi
2007-04-12 5:39 ` Pedro
-- strict thread matches above, loose matches on Subject: below --
2007-04-11 5:23 Pedro
2007-04-11 19:48 ` Willy Tarreau
2007-04-11 21:05 ` Mouawad, Tony
2007-04-11 22:27 ` Pedro
2007-04-11 22:39 ` Alan Cox
2007-04-12 5:19 ` Pedro
2007-04-12 9:42 ` Alan Cox
2007-04-12 15:08 ` Pedro
2007-04-12 11:25 ` Theodore Tso
2007-04-12 14:35 ` Pedro
2007-04-12 20:31 ` Willy Tarreau
2007-04-12 20:56 ` Mouawad, Tony
2007-04-12 21:09 ` Willy Tarreau
2007-04-12 8:13 ` Jan Engelhardt
2007-04-12 8:19 ` Willy Tarreau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox