Memory overcommit

All of lore.kernel.org
 help / color / mirror / Atom feed

* Memory overcommit
@ 2005-12-09 22:00 Tracy R Reed
  2005-12-11  2:00 ` Kip Macy
  0 siblings, 1 reply; 128+ messages in thread
From: Tracy R Reed @ 2005-12-09 22:00 UTC (permalink / raw)
  To: xen-devel

I have been using Xen on a daily basis on a production (but not
critical) machine for a number of months now. It's looking really good.

One thing that I have not yet seen anyone mention as a feature that I
would really like to see is the ability to overcommit memory. I have 2G
of RAM in my machine. I would like to give a developer his own virtual
domain to sandbox his application development without having to dedicate
a whole piece of hardware to just him. But I know he won't really log in
and use it all that often. If I give him 512M of my 2G that's 25% of my
memory that will likely be unutilized most of the time. It would be
great if I could assign more memory to domains than I actually have and
just let it swap out idle pages. I bet there are a lot of boxes out
there, especially in webserver colo's, that really don't get much
traffic and really don't need as much RAM as they have in them for
normal day to day operations. Just let them swap everything back in and
use up to the maximum RAM configured for that domain if they get busy
and need it but let it swap out the rest of the time so other busier
domains can use the physical RAM.

This is feature #1 on my Xen wishlist. Is there any work going into this
area?

-- 
Tracy R Reed
http://copilotconsulting.com
1-877-MY-COPILOT

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2005-12-09 22:00 Memory overcommit Tracy R Reed
@ 2005-12-11  2:00 ` Kip Macy
  2005-12-11 15:45   ` Keir Fraser
  0 siblings, 1 reply; 128+ messages in thread
From: Kip Macy @ 2005-12-11  2:00 UTC (permalink / raw)
  To: Tracy R Reed; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1793 bytes --]

The balloon driver provides a mechanism for reducing a guest's memory
allocation at run-time. At least in principle, it would not be difficult to
write a small app that provided policy to slim down idle domUs.

                -Kip

On 12/9/05, Tracy R Reed <treed@copilotconsulting.com> wrote:
>
> I have been using Xen on a daily basis on a production (but not
> critical) machine for a number of months now. It's looking really good.
>
> One thing that I have not yet seen anyone mention as a feature that I
> would really like to see is the ability to overcommit memory. I have 2G
> of RAM in my machine. I would like to give a developer his own virtual
> domain to sandbox his application development without having to dedicate
> a whole piece of hardware to just him. But I know he won't really log in
> and use it all that often. If I give him 512M of my 2G that's 25% of my
> memory that will likely be unutilized most of the time. It would be
> great if I could assign more memory to domains than I actually have and
> just let it swap out idle pages. I bet there are a lot of boxes out
> there, especially in webserver colo's, that really don't get much
> traffic and really don't need as much RAM as they have in them for
> normal day to day operations. Just let them swap everything back in and
> use up to the maximum RAM configured for that domain if they get busy
> and need it but let it swap out the rest of the time so other busier
> domains can use the physical RAM.
>
> This is feature #1 on my Xen wishlist. Is there any work going into this
> area?
>
> --
> Tracy R Reed
> http://copilotconsulting.com
> 1-877-MY-COPILOT
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

[-- Attachment #1.2: Type: text/html, Size: 2630 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2005-12-11  2:00 ` Kip Macy
@ 2005-12-11 15:45   ` Keir Fraser
  2005-12-11 19:59     ` Rik van Riel
  0 siblings, 1 reply; 128+ messages in thread
From: Keir Fraser @ 2005-12-11 15:45 UTC (permalink / raw)
  To: treed; +Cc: xen-devel List

Even better, if each domain is only used occasionally, then you could 
save them to disc and restore them on demand. Then they consume no 
memory when not in use. You could combine this with use of the balloon 
driver to implement an allocation policy for in-use domains (as Kip 
suggests) --- e.g., if X domains are active, each gets 1/X of available 
memory.

  -- Keir

On 11 Dec 2005, at 02:00, Kip Macy wrote:

> The balloon driver provides a mechanism for reducing a guest's memory 
> allocation at run-time. At least in principle, it would not be 
> difficult to write a small app that provided policy to slim down idle 
> domUs.
>
>                  -Kip
>
> On 12/9/05, Tracy R Reed <treed@copilotconsulting.com > wrote:
>> critical) machine for a number of months now. It's looking really 
>> good.
>>
>> One thing that I have not yet seen anyone mention as a feature that I
>> would really like to see is the ability to overcommit memory. I have 
>> 2G
>> of RAM in my machine. I would like to give a developer his own virtual
>> domain to sandbox his application development without having to 
>> dedicate
>> a whole piece of hardware to just him. But I know he won't really log 
>> in
>> and use it all that often. If I give him 512M of my 2G that's 25% of 
>> my
>> memory that will likely be unutilized most of the time. It would be
>> great if I could assign more memory to domains than I actually have 
>> and
>> just let it swap out idle pages. I bet there are a lot of boxes out
>> there, especially in webserver colo's, that really don't get much
>> traffic and really don't need as much RAM as they have in them for
>> normal day to day operations. Just let them swap everything back in 
>> and
>> use up to the maximum RAM configured for that domain if they get busy
>> and need it but let it swap out the rest of the time so other busier
>> domains can use the physical RAM.
>>
>> This is feature #1 on my Xen wishlist. Is there any work going into 
>> this
>> area?
>>
>> --
>> Tracy R Reed
>> http://copilotconsulting.com
>> 1-877-MY-COPILOT
>>
>> _______________________________________________
>> Xen-devel mailing list
>>  Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2005-12-11 15:45   ` Keir Fraser
@ 2005-12-11 19:59     ` Rik van Riel
  2005-12-13 16:10       ` Keir Fraser
  0 siblings, 1 reply; 128+ messages in thread
From: Rik van Riel @ 2005-12-11 19:59 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel List, treed

On Sun, 11 Dec 2005, Keir Fraser wrote:

> Even better, if each domain is only used occasionally, then you could 
> save them to disc and restore them on demand.

How would one implement such a "wake on LAN" functionality
for Xen domains ?

> You could combine this with use of the balloon driver to implement an 
> allocation policy for in-use domains (as Kip suggests)

Better yet, balloon a domain down to something really small
before swapping it out.  Then the latency to get it running
again when activity happens can be kept to a minimum.

-- 
All Rights Reversed

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2005-12-11 19:59     ` Rik van Riel
@ 2005-12-13 16:10       ` Keir Fraser
  2005-12-13 16:25         ` Jacob Gorm Hansen
  0 siblings, 1 reply; 128+ messages in thread
From: Keir Fraser @ 2005-12-13 16:10 UTC (permalink / raw)
  To: Rik van Riel; +Cc: xen-devel List, treed

On 11 Dec 2005, at 19:59, Rik van Riel wrote:

>> Even better, if each domain is only used occasionally, then you could
>> save them to disc and restore them on demand.
>
> How would one implement such a "wake on LAN" functionality
> for Xen domains ?

There are various ways. Maybe require explicit signalling from the user 
via some command to the control-plane tools (start up my domain / shut 
down my domain). You could also do it implicitly (although it may 
require some iptables hacking) by tracking network connections to the 
domain's IP address -- start it up on first connection / shut it down 
when last connection finishes.

  -- Keir

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2005-12-13 16:10       ` Keir Fraser
@ 2005-12-13 16:25         ` Jacob Gorm Hansen
  0 siblings, 0 replies; 128+ messages in thread
From: Jacob Gorm Hansen @ 2005-12-13 16:25 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel List, treed

On 12/13/05, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:
>
> On 11 Dec 2005, at 19:59, Rik van Riel wrote:
>
> >> Even better, if each domain is only used occasionally, then you could
> >> save them to disc and restore them on demand.
> >
> > How would one implement such a "wake on LAN" functionality
> > for Xen domains ?
>
> There are various ways. Maybe require explicit signalling from the user
> via some command to the control-plane tools (start up my domain / shut
> down my domain). You could also do it implicitly (although it may
> require some iptables hacking) by tracking network connections to the
> domain's IP address -- start it up on first connection / shut it down
> when last connection finishes.

Or you could use my self-checkpointing code to have the domain save
itself, then free most of its pages, except for a small stub that
would listen for some 'wake up' event on the network interface (e.g. a
TCP SYN packet). The stub could then allocate new pages and resume the
checkpoint from disk.

Jacob

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Memory overcommit
@ 2009-10-12 11:51 Vedran Furač
  2009-10-13  3:08 ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 128+ messages in thread
From: Vedran Furač @ 2009-10-12 11:51 UTC (permalink / raw)
  To: linux-kernel

Hi! I don't know if this is appropriate place to ask such questions and
if not, please point me to such place.

Let's simulate a process gone berserk with this piece of code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>

int main()
{
  char *buf;
  while(1) {
    buf = malloc (1024*1024*100);
    if ( buf == NULL ) {
      perror("malloc");
      getchar();
      exit(EXIT_FAILURE);
    }
    sleep(1);
    memset(buf, 1, 1024*1024*100);
  }
  return 0;
}

# echo 0 > /proc/sys/vm/overcommit_memory

Compile, run and soon result is:
- System freezes for a second or two
- OOMK wakes up
- X crashes

Now, I'm back to VT1 and dmesg shows 8 process were killed by OOMK
(including X server and some long running daemons with small memory
footprint like automount) before the real culprit was killed. This
random killing spree *really* gives bad reputation to linux and people
usually point this out as an argument against it.

But, there is an easy fix:
# echo 2 > /proc/sys/vm/overcommit_memory
Run the program again and after a few seconds you'll get:

"malloc: Cannot allocate memory"

and that's all what happens. Nothing gets killed and one (and others
too) can continue to work without loosing time, data or both. Only
somewhat strange is that kernel contradicts itself when it says there is
no more and in the same time saying:

/proc/meminfo
MemTotal:        3542532 kB
MemFree:          892972 kB
Buffers:            2664 kB
Cached:           130940 kB

...that there is almost 900MB free memory. But OK, I can live with it.

So, my question is: why today overcommit isn't turned off *by default*?
I have it turned off for a few years now and only side effect is that I
don't get processes killed randomly anymore, I don't loose valuable time
and data.

Regards,

Vedran

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-12 11:51 Vedran Furač
@ 2009-10-13  3:08 ` KAMEZAWA Hiroyuki
  2009-10-13 17:13   ` Vedran Furač
  0 siblings, 1 reply; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-13  3:08 UTC (permalink / raw)
  To: vedran.furac; +Cc: Vedran Furač, linux-kernel

On Mon, 12 Oct 2009 13:51:07 +0200
Vedran Furač <vedranf@vedranf.mine.nu> wrote:
> /proc/meminfo
> MemTotal:        3542532 kB
> MemFree:          892972 kB
> Buffers:            2664 kB
> Cached:           130940 kB
> 
> ...that there is almost 900MB free memory. But OK, I can live with it.
> 
> So, my question is: why today overcommit isn't turned off *by default*?
> I have it turned off for a few years now and only side effect is that I
> don't get processes killed randomly anymore, I don't loose valuable time
> and data.
> 
"isn't turned off" means "vm.overcommit_memory==2" ?
And...what's version your kernel is ?
oom-killer still finds "definitely-not-guilty" ones ?

I guess the reason of default value is that the kernel assume processes will
not always use all mmaped range. There will be unused range in process's virtual
memory and it can be big.

For example, typical case in a server, 
when you run multi-thread program (like java VM),

  - stack per thread
  - malloc() arena per thread 

can makes difference among size-of-mapped-range v.s. used-pages bigger.
I saw Gigabytes of unused range on ia64 host,...statck size was big.

IIUC, the size is determined by ulimit's stack size at default. it's 10M on
my x86-64 host.
You'll see 1G of commited usage when you run 100 no-op threads.

And if strict check(vm.ovecommit_memory=2) is used, mmap() return -ENOMEM
whenever it hits limit.
You have to find "which processs should be killed" by youself, anyway.

Against random-kill, you may have 2 choices.

1. use  /proc/<pid>/oom_adj
2. use  memory cgroup.

Something more easy-to-use method may be appriciated. We have above 2 now.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-13  3:08 ` KAMEZAWA Hiroyuki
@ 2009-10-13 17:13   ` Vedran Furač
  2009-10-14  4:51     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 128+ messages in thread
From: Vedran Furač @ 2009-10-13 17:13 UTC (permalink / raw)
  To: linux-kernel

KAMEZAWA Hiroyuki wrote:

> On Mon, 12 Oct 2009 13:51:07 +0200 Vedran Furač
> <vedranf@vedranf.mine.nu> wrote:
>> /proc/meminfo MemTotal:        3542532 kB MemFree:          892972
>> kB Buffers:            2664 kB Cached:           130940 kB
>> 
>> ...that there is almost 900MB free memory. But OK, I can live with
>> it.
>> 
>> So, my question is: why today overcommit isn't turned off *by
>> default*? I have it turned off for a few years now and only side
>> effect is that I don't get processes killed randomly anymore, I
>> don't loose valuable time and data.
>> 
> "isn't turned off" means "vm.overcommit_memory==2" ?

Yes, "2: always check, never overcommit" as per proc(5)

> And...what's version your kernel is ?

Applies to every 2.6.

> oom-killer still finds "definitely-not-guilty" ones ?

Yes. It's always repeatable. Just compile and run that code. I'll
probably just file a bug report.

> I guess the reason of default value is that the kernel assume
> processes will not always use all mmaped range. There will be unused
> range in process's virtual memory and it can be big.
> 
> For example, typical case in a server, when you run multi-thread
> program (like java VM),
> 
> - stack per thread - malloc() arena per thread
> 
> can makes difference among size-of-mapped-range v.s. used-pages
> bigger. I saw Gigabytes of unused range on ia64 host,...statck size
> was big.

Yes, I noticed that JVM allocates gigabytes but then uses less than 10%
of that and, as a consequence, eclipse sometimes fails to start although
there's plenty of free memory. So overcommiting is some kind of a
workaround for broken software that allocate not what they need but what
they might need in some rare occurrences. I would rather like fixing
this userland software than risking OOM situations and random killing of
innocent processes.

> And if strict check(vm.ovecommit_memory=2) is used, mmap() return
> -ENOMEM whenever it hits limit.

% strace -f -e mmap java -version
[...]
mmap(NULL, 996147200, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot
allocate memory)

And that should be fine.

> Against random-kill, you may have 2 choices.
> 
> 1. use  /proc/<pid>/oom_adj 2. use  memory cgroup.
> 
> Something more easy-to-use method may be appriciated. We have above 2
> now.

These are just bad workarounds for bad OOM algorithm. I tested this
little program on multiple systems (including windows) without any
tweaking and linux behavior is, unfortunately *the worst*.  :/


Regards,

Vedran


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-13 17:13   ` Vedran Furač
@ 2009-10-14  4:51     ` KAMEZAWA Hiroyuki
  2009-10-20 21:52       ` Vedran Furač
  0 siblings, 1 reply; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-14  4:51 UTC (permalink / raw)
  To: vedran.furac; +Cc: Vedran Furač, linux-kernel

On Tue, 13 Oct 2009 19:13:34 +0200
Vedran Furač <vedranf@vedranf.mine.nu> wrote:

> KAMEZAWA Hiroyuki wrote:
> > I guess the reason of default value is that the kernel assume
> > processes will not always use all mmaped range. There will be unused
> > range in process's virtual memory and it can be big.
> > 
> > For example, typical case in a server, when you run multi-thread
> > program (like java VM),
> > 
> > - stack per thread - malloc() arena per thread
> > 
> > can makes difference among size-of-mapped-range v.s. used-pages
> > bigger. I saw Gigabytes of unused range on ia64 host,...statck size
> > was big.
> 
> Yes, I noticed that JVM allocates gigabytes but then uses less than 10%
> of that and, as a consequence, eclipse sometimes fails to start although
> there's plenty of free memory. So overcommiting is some kind of a
> workaround for broken software that allocate not what they need but what
> they might need in some rare occurrences. I would rather like fixing
> this userland software than risking OOM situations and random killing of
> innocent processes.
> 

In my understanding, mmap() is just for requesting virtual address space.
Not for requesting memory in these days.



> > And if strict check(vm.ovecommit_memory=2) is used, mmap() return
> > -ENOMEM whenever it hits limit.
> 
> % strace -f -e mmap java -version
> [...]
> mmap(NULL, 996147200, PROT_READ|PROT_WRITE|PROT_EXEC,
> MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot
> allocate memory)
> 
> And that should be fine.
> 
It's not fine for me ;)

> > Against random-kill, you may have 2 choices.
> > 
> > 1. use  /proc/<pid>/oom_adj 2. use  memory cgroup.
> > 
> > Something more easy-to-use method may be appriciated. We have above 2
> > now.
> 
> These are just bad workarounds for bad OOM algorithm. I tested this
> little program on multiple systems (including windows) without any
> tweaking and linux behavior is, unfortunately *the worst*.  :/
> 
> 
Yes, they are workaround. You can use /etc/sysctl.conf.
But if making it default _now_, many threaded programs will not work.
 
But I agree, OOM killer should be sophisticated.
Please give us a sample program/test case which causes problem.
linux-mm@kvack.org may be a better place. lkml has too much traffic.

Regards,
-Kame


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-14  4:51     ` KAMEZAWA Hiroyuki
@ 2009-10-20 21:52       ` Vedran Furač
  2009-10-26  1:55         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 128+ messages in thread
From: Vedran Furač @ 2009-10-20 21:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm

Hi and sorry for delay. Also, please CC me.

KAMEZAWA Hiroyuki wrote:

> On Tue, 13 Oct 2009 19:13:34 +0200
> Vedran FuraA? <vedranf@vedranf.mine.nu> wrote:
> 
>>> Against random-kill, you may have 2 choices.
>>>
>>> 1. use  /proc/<pid>/oom_adj 2. use  memory cgroup.
>>>
>>> Something more easy-to-use method may be appriciated. We have above 2
>>> now.
>> These are just bad workarounds for bad OOM algorithm. I tested this
>> little program on multiple systems (including windows) without any
>> tweaking and linux behavior is, unfortunately *the worst*.  :/
>>
> Yes, they are workaround. You can use /etc/sysctl.conf.
> But if making it default _now_, many threaded programs will not work.

Only Java ;) and only sometimes, at least from my experinence

> But I agree, OOM killer should be sophisticated.
> Please give us a sample program/test case which causes problem.
> linux-mm@kvack.org may be a better place. lkml has too much traffic.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>

int main()
{
  char *buf;
  while(1) {
    buf = malloc (1024*1024*100);
    if ( buf == NULL ) {
      perror("malloc");
      getchar();
      exit(EXIT_FAILURE);
    }
    sleep(1);
    memset(buf, 1, 1024*1024*100);
  }
  return 0;
}


After running this on a typical desktop with gnome or kde, OOM killer
will kill 5-10 innocent processes before killing this one. Tested
multiple times on multiple installations.

Regards,

Vedran


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-20 21:52       ` Vedran Furač
@ 2009-10-26  1:55         ` KAMEZAWA Hiroyuki
  2009-10-26 16:16             ` Vedran Furač
  0 siblings, 1 reply; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-26  1:55 UTC (permalink / raw)
  To: vedran.furac; +Cc: linux-mm

On Tue, 20 Oct 2009 23:52:33 +0200
Vedran FuraA? <vedran.furac@gmail.com> wrote:

> Hi and sorry for delay. Also, please CC me.

> > But I agree, OOM killer should be sophisticated.
> > Please give us a sample program/test case which causes problem.
> > linux-mm@kvack.org may be a better place. lkml has too much traffic.
> 
> #include <stdio.h>
> #include <string.h>
> #include <stdlib.h>
> #include <unistd.h>
> 
> int main()
> {
>   char *buf;
>   while(1) {
>     buf = malloc (1024*1024*100);
>     if ( buf == NULL ) {
>       perror("malloc");
>       getchar();
>       exit(EXIT_FAILURE);
>     }
>     sleep(1);
>     memset(buf, 1, 1024*1024*100);
>   }
>   return 0;
> }
> 
> 
> After running this on a typical desktop with gnome or kde, OOM killer
> will kill 5-10 innocent processes before killing this one. Tested
> multiple times on multiple installations.
> 
> Regards,
> 
Can I make more questions ?

 - What's cpu ?
 - How much memory ?
 - Do you have swap ?
 - What's the latest kernel version you tested?
 - Could you show me /var/log/dmesg and /var/log/messages at OOM ?
 
Thanks,
-Kame



> Vedran
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-26  1:55         ` KAMEZAWA Hiroyuki
@ 2009-10-26 16:16             ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-26 16:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel

KAMEZAWA Hiroyuki wrote:

> Can I make more questions ?

Sure

>  - What's cpu ?

vendor_id       : AuthenticAMD

cpu family      : 16

model           : 4

model name      : AMD Phenom(tm) II X3 720 Processor

stepping        : 2

cpu MHz         : 3314.812

cache size      : 512 KB

>  - How much memory ?
>  - Do you have swap ?

           total       used       free     shared    buffers     cached
Mem:        3459       1452       2007          0         65        622
-/+ buffers/cache:      764       2695
Swap:          0          0          0

So, no swap. Don't need it.

>  - What's the latest kernel version you tested?

2.6.30-2-amd64 #1 SMP (on Debian)

>  - Could you show me /var/log/dmesg and /var/log/messages at OOM ?

It was catastrophe. :) X crashed (or killed) with all the programs, but
my little program was alive for 20 minutes (see timestamps). And for
that time computer was completely unusable. Couldn't even get the
console via ssh. Rally embarrassing for a modern OS to get destroyed by
a 5 lines of C run as an ordinary user. Luckily screen was still alive,
oomk usually kills it also. See for yourself:

dmesg: http://pastebin.com/f3f83738a
messages: http://pastebin.com/f2091110a

(CCing to lklm again... I just want people to see the logs.)

Regards,

Vedran

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-26 16:16             ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-26 16:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel

KAMEZAWA Hiroyuki wrote:

> Can I make more questions ?

Sure

>  - What's cpu ?

vendor_id       : AuthenticAMD


cpu family      : 16


model           : 4


model name      : AMD Phenom(tm) II X3 720 Processor


stepping        : 2


cpu MHz         : 3314.812


cache size      : 512 KB


>  - How much memory ?
>  - Do you have swap ?

           total       used       free     shared    buffers     cached
Mem:        3459       1452       2007          0         65        622
-/+ buffers/cache:      764       2695
Swap:          0          0          0

So, no swap. Don't need it.

>  - What's the latest kernel version you tested?

2.6.30-2-amd64 #1 SMP (on Debian)

>  - Could you show me /var/log/dmesg and /var/log/messages at OOM ?

It was catastrophe. :) X crashed (or killed) with all the programs, but
my little program was alive for 20 minutes (see timestamps). And for
that time computer was completely unusable. Couldn't even get the
console via ssh. Rally embarrassing for a modern OS to get destroyed by
a 5 lines of C run as an ordinary user. Luckily screen was still alive,
oomk usually kills it also. See for yourself:

dmesg: http://pastebin.com/f3f83738a
messages: http://pastebin.com/f2091110a

(CCing to lklm again... I just want people to see the logs.)

Regards,

Vedran

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-26 16:16             ` Vedran Furač
@ 2009-10-27  3:22               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-27  3:22 UTC (permalink / raw)
  To: vedran.furac; +Cc: linux-mm, linux-kernel, kosaki.motohiro@jp.fujitsu.com

[-- Attachment #1: Type: text/plain, Size: 1614 bytes --]

On Mon, 26 Oct 2009 17:16:14 +0100
Vedran Furač <vedran.furac@gmail.com> wrote:
> >  - Could you show me /var/log/dmesg and /var/log/messages at OOM ?
> 
> It was catastrophe. :) X crashed (or killed) with all the programs, but
> my little program was alive for 20 minutes (see timestamps). And for
> that time computer was completely unusable. Couldn't even get the
> console via ssh. Rally embarrassing for a modern OS to get destroyed by
> a 5 lines of C run as an ordinary user. Luckily screen was still alive,
> oomk usually kills it also. See for yourself:
> 
> dmesg: http://pastebin.com/f3f83738a
> messages: http://pastebin.com/f2091110a
> 
> (CCing to lklm again... I just want people to see the logs.)
> 
Thank you for reporting and your patience. It seems something strange
that your KDE programs are killed. I agree.

I attached a scirpt for checking oom_score of all exisiting process.
(oom_score is a value used for selecting "bad" processs.")
please run if you have time.

This is a result of my own desktop(on virtual machine.)
In this environ (Total memory is 1.6GBytes), mmap(1G) program is running.

%check_badness.pl | sort -n | tail
--
89924	3938	mixer_applet2
90210	3942	tomboy
94753	3936	clock-applet
101994	3919	pulseaudio
113525	4028	gnome-terminal
127340	1	init
128177	3871	nautilus
151003	11515	bash
256944	11653	mmap
425561	3829	gnome-session
--
Sigh, gnome-session has twice value of mmap(1G).
Of course, gnome-session only uses 6M bytes of anon.
I wonder this is because gnome-session has many children..but need to
dig more. Does anyone has idea ?
(CCed kosaki)

Thanks,
-Kame





[-- Attachment #2: check_badness.pl --]
[-- Type: text/x-perl, Size: 313 bytes --]

#!/usr/bin/perl

open(LINE, "ps -A -o pid,comm | grep -v PID|") || die "can't ps";

while (<LINE>) {
	/^\s*([0-9]+)\s+(.*)$/;
	$PID=$1;
	$COMM=$2;
	open(SCORE, "/proc/$PID/oom_score") || next;
	$oom_score = <SCORE>;
	chomp($oom_score);
	close(SCORE);
	print $oom_score."\t".$PID . "\t",$COMM."\n";
}
close(LINE);

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-27  3:22               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-27  3:22 UTC (permalink / raw)
  To: vedran.furac; +Cc: linux-mm, linux-kernel, kosaki.motohiro@jp.fujitsu.com

[-- Attachment #1: Type: text/plain, Size: 1614 bytes --]

On Mon, 26 Oct 2009 17:16:14 +0100
Vedran FuraA? <vedran.furac@gmail.com> wrote:
> >  - Could you show me /var/log/dmesg and /var/log/messages at OOM ?
> 
> It was catastrophe. :) X crashed (or killed) with all the programs, but
> my little program was alive for 20 minutes (see timestamps). And for
> that time computer was completely unusable. Couldn't even get the
> console via ssh. Rally embarrassing for a modern OS to get destroyed by
> a 5 lines of C run as an ordinary user. Luckily screen was still alive,
> oomk usually kills it also. See for yourself:
> 
> dmesg: http://pastebin.com/f3f83738a
> messages: http://pastebin.com/f2091110a
> 
> (CCing to lklm again... I just want people to see the logs.)
> 
Thank you for reporting and your patience. It seems something strange
that your KDE programs are killed. I agree.

I attached a scirpt for checking oom_score of all exisiting process.
(oom_score is a value used for selecting "bad" processs.")
please run if you have time.

This is a result of my own desktop(on virtual machine.)
In this environ (Total memory is 1.6GBytes), mmap(1G) program is running.

%check_badness.pl | sort -n | tail
--
89924	3938	mixer_applet2
90210	3942	tomboy
94753	3936	clock-applet
101994	3919	pulseaudio
113525	4028	gnome-terminal
127340	1	init
128177	3871	nautilus
151003	11515	bash
256944	11653	mmap
425561	3829	gnome-session
--
Sigh, gnome-session has twice value of mmap(1G).
Of course, gnome-session only uses 6M bytes of anon.
I wonder this is because gnome-session has many children..but need to
dig more. Does anyone has idea ?
(CCed kosaki)

Thanks,
-Kame





[-- Attachment #2: check_badness.pl --]
[-- Type: text/x-perl, Size: 313 bytes --]

#!/usr/bin/perl

open(LINE, "ps -A -o pid,comm | grep -v PID|") || die "can't ps";

while (<LINE>) {
	/^\s*([0-9]+)\s+(.*)$/;
	$PID=$1;
	$COMM=$2;
	open(SCORE, "/proc/$PID/oom_score") || next;
	$oom_score = <SCORE>;
	chomp($oom_score);
	close(SCORE);
	print $oom_score."\t".$PID . "\t",$COMM."\n";
}
close(LINE);

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-27  3:22               ` KAMEZAWA Hiroyuki
@ 2009-10-27  6:10                 ` KOSAKI Motohiro
  -1 siblings, 0 replies; 128+ messages in thread
From: KOSAKI Motohiro @ 2009-10-27  6:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: vedran.furac, linux-mm, linux-kernel

2009/10/27 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
> On Mon, 26 Oct 2009 17:16:14 +0100
> Vedran Furač <vedran.furac@gmail.com> wrote:
>> >  - Could you show me /var/log/dmesg and /var/log/messages at OOM ?
>>
>> It was catastrophe. :) X crashed (or killed) with all the programs, but
>> my little program was alive for 20 minutes (see timestamps). And for
>> that time computer was completely unusable. Couldn't even get the
>> console via ssh. Rally embarrassing for a modern OS to get destroyed by
>> a 5 lines of C run as an ordinary user. Luckily screen was still alive,
>> oomk usually kills it also. See for yourself:
>>
>> dmesg: http://pastebin.com/f3f83738a
>> messages: http://pastebin.com/f2091110a
>>
>> (CCing to lklm again... I just want people to see the logs.)
>>
> Thank you for reporting and your patience. It seems something strange
> that your KDE programs are killed. I agree.
>
> I attached a scirpt for checking oom_score of all exisiting process.
> (oom_score is a value used for selecting "bad" processs.")
> please run if you have time.
>
> This is a result of my own desktop(on virtual machine.)
> In this environ (Total memory is 1.6GBytes), mmap(1G) program is running.
>
> %check_badness.pl | sort -n | tail
> --
> 89924   3938    mixer_applet2
> 90210   3942    tomboy
> 94753   3936    clock-applet
> 101994  3919    pulseaudio
> 113525  4028    gnome-terminal
> 127340  1       init
> 128177  3871    nautilus
> 151003  11515   bash
> 256944  11653   mmap
> 425561  3829    gnome-session
> --
> Sigh, gnome-session has twice value of mmap(1G).
> Of course, gnome-session only uses 6M bytes of anon.
> I wonder this is because gnome-session has many children..but need to
> dig more. Does anyone has idea ?
> (CCed kosaki)

Following output address the issue.
The fact is, modern desktop application linked pretty many library. it
makes bloat VSS size and increase
OOM score.

Ideally, We shouldn't account evictable file-backed mappings for oom_score.


# cat /proc/`pidof gnome-session`/maps
00400000-00433000 r-xp 00000000 fd:00 100061
  /usr/bin/gnome-session
00632000-00637000 rw-p 00032000 fd:00 100061
  /usr/bin/gnome-session
00949000-00a10000 rw-p 00000000 00:00 0                                  [heap]
34cf600000-34cf61f000 r-xp 00000000 fd:00 1088
  /lib64/ld-2.10.1.so
34cf81e000-34cf81f000 r--p 0001e000 fd:00 1088
  /lib64/ld-2.10.1.so
34cf81f000-34cf820000 rw-p 0001f000 fd:00 1088
  /lib64/ld-2.10.1.so
34cfa00000-34cfb64000 r-xp 00000000 fd:00 1089
  /lib64/libc-2.10.1.so
34cfb64000-34cfd64000 ---p 00164000 fd:00 1089
  /lib64/libc-2.10.1.so
34cfd64000-34cfd68000 r--p 00164000 fd:00 1089
  /lib64/libc-2.10.1.so
34cfd68000-34cfd69000 rw-p 00168000 fd:00 1089
  /lib64/libc-2.10.1.so
34cfd69000-34cfd6e000 rw-p 00000000 00:00 0
34cfe00000-34cfe82000 r-xp 00000000 fd:00 1104
  /lib64/libm-2.10.1.so
34cfe82000-34d0082000 ---p 00082000 fd:00 1104
  /lib64/libm-2.10.1.so
34d0082000-34d0083000 r--p 00082000 fd:00 1104
  /lib64/libm-2.10.1.so
34d0083000-34d0084000 rw-p 00083000 fd:00 1104
  /lib64/libm-2.10.1.so
34d0200000-34d0202000 r-xp 00000000 fd:00 1095
  /lib64/libdl-2.10.1.so
34d0202000-34d0402000 ---p 00002000 fd:00 1095
  /lib64/libdl-2.10.1.so
34d0402000-34d0403000 r--p 00002000 fd:00 1095
  /lib64/libdl-2.10.1.so
34d0403000-34d0404000 rw-p 00003000 fd:00 1095
  /lib64/libdl-2.10.1.so
34d0600000-34d0617000 r-xp 00000000 fd:00 1090
  /lib64/libpthread-2.10.1.so
34d0617000-34d0816000 ---p 00017000 fd:00 1090
  /lib64/libpthread-2.10.1.so
34d0816000-34d0817000 r--p 00016000 fd:00 1090
  /lib64/libpthread-2.10.1.so
34d0817000-34d0818000 rw-p 00017000 fd:00 1090
  /lib64/libpthread-2.10.1.so
34d0818000-34d081c000 rw-p 00000000 00:00 0
34d0a00000-34d0a15000 r-xp 00000000 fd:00 1113
  /lib64/libz.so.1.2.3
34d0a15000-34d0c14000 ---p 00015000 fd:00 1113
  /lib64/libz.so.1.2.3
34d0c14000-34d0c15000 rw-p 00014000 fd:00 1113
  /lib64/libz.so.1.2.3
34d0e00000-34d0e07000 r-xp 00000000 fd:00 1091
  /lib64/librt-2.10.1.so
34d0e07000-34d1006000 ---p 00007000 fd:00 1091
  /lib64/librt-2.10.1.so
34d1006000-34d1007000 r--p 00006000 fd:00 1091
  /lib64/librt-2.10.1.so
34d1007000-34d1008000 rw-p 00007000 fd:00 1091
  /lib64/librt-2.10.1.so
34d1200000-34d121c000 r-xp 00000000 fd:00 1097
  /lib64/libselinux.so.1
34d121c000-34d141b000 ---p 0001c000 fd:00 1097
  /lib64/libselinux.so.1
34d141b000-34d141c000 r--p 0001b000 fd:00 1097
  /lib64/libselinux.so.1
34d141c000-34d141d000 rw-p 0001c000 fd:00 1097
  /lib64/libselinux.so.1
34d141d000-34d141e000 rw-p 00000000 00:00 0
34d1600000-34d16dd000 r-xp 00000000 fd:00 1092
  /lib64/libglib-2.0.so.0.2000.4
34d16dd000-34d18dc000 ---p 000dd000 fd:00 1092
  /lib64/libglib-2.0.so.0.2000.4
34d18dc000-34d18de000 rw-p 000dc000 fd:00 1092
  /lib64/libglib-2.0.so.0.2000.4
34d1a00000-34d1a41000 r-xp 00000000 fd:00 1094
  /lib64/libgobject-2.0.so.0.2000.4
34d1a41000-34d1c41000 ---p 00041000 fd:00 1094
  /lib64/libgobject-2.0.so.0.2000.4
34d1c41000-34d1c43000 rw-p 00041000 fd:00 1094
  /lib64/libgobject-2.0.so.0.2000.4
34d1e00000-34d1e02000 r-xp 00000000 fd:00 1115
  /usr/lib64/libXau.so.6.0.0
34d1e02000-34d2001000 ---p 00002000 fd:00 1115
  /usr/lib64/libXau.so.6.0.0
34d2001000-34d2002000 rw-p 00001000 fd:00 1115
  /usr/lib64/libXau.so.6.0.0
34d2200000-34d2203000 r-xp 00000000 fd:00 1096
  /lib64/libgmodule-2.0.so.0.2000.4
34d2203000-34d2402000 ---p 00003000 fd:00 1096
  /lib64/libgmodule-2.0.so.0.2000.4
34d2402000-34d2403000 rw-p 00002000 fd:00 1096
  /lib64/libgmodule-2.0.so.0.2000.4
34d2600000-34d261a000 r-xp 00000000 fd:00 1116
  /usr/lib64/libxcb.so.1.1.0
34d261a000-34d281a000 ---p 0001a000 fd:00 1116
  /usr/lib64/libxcb.so.1.1.0
34d281a000-34d281b000 rw-p 0001a000 fd:00 1116
  /usr/lib64/libxcb.so.1.1.0
34d2a00000-34d2b34000 r-xp 00000000 fd:00 1117
  /usr/lib64/libX11.so.6.2.0
34d2b34000-34d2d33000 ---p 00134000 fd:00 1117
  /usr/lib64/libX11.so.6.2.0
34d2d33000-34d2d39000 rw-p 00133000 fd:00 1117
  /usr/lib64/libX11.so.6.2.0
34d2e00000-34d2e04000 r-xp 00000000 fd:00 1093
  /lib64/libgthread-2.0.so.0.2000.4
34d2e04000-34d3003000 ---p 00004000 fd:00 1093
  /lib64/libgthread-2.0.so.0.2000.4
34d3003000-34d3004000 rw-p 00003000 fd:00 1093
  /lib64/libgthread-2.0.so.0.2000.4
34d3200000-34d3226000 r-xp 00000000 fd:00 1111
  /lib64/libexpat.so.1.5.2
34d3226000-34d3425000 ---p 00026000 fd:00 1111
  /lib64/libexpat.so.1.5.2
34d3425000-34d3428000 rw-p 00025000 fd:00 1111
  /lib64/libexpat.so.1.5.2
34d3600000-34d3676000 r-xp 00000000 fd:00 1098
  /lib64/libgio-2.0.so.0.2000.4
34d3676000-34d3875000 ---p 00076000 fd:00 1098
  /lib64/libgio-2.0.so.0.2000.4
34d3875000-34d3877000 rw-p 00075000 fd:00 1098
  /lib64/libgio-2.0.so.0.2000.4
34d3877000-34d3878000 rw-p 00000000 00:00 0
34d3a00000-34d3a93000 r-xp 00000000 fd:00 1110
  /usr/lib64/libfreetype.so.6.3.20
34d3a93000-34d3c93000 ---p 00093000 fd:00 1110
  /usr/lib64/libfreetype.so.6.3.20
34d3c93000-34d3c99000 rw-p 00093000 fd:00 1110
  /usr/lib64/libfreetype.so.6.3.20
34d3e00000-34d3e04000 r-xp 00000000 fd:00 1141
  /lib64/libattr.so.1.1.0
34d3e04000-34d4003000 ---p 00004000 fd:00 1141
  /lib64/libattr.so.1.1.0
34d4003000-34d4004000 rw-p 00003000 fd:00 1141
  /lib64/libattr.so.1.1.0
34d4200000-34d4211000 r-xp 00000000 fd:00 1123
  /usr/lib64/libXext.so.6.4.0
34d4211000-34d4411000 ---p 00011000 fd:00 1123
  /usr/lib64/libXext.so.6.4.0
34d4411000-34d4412000 rw-p 00011000 fd:00 1123
  /usr/lib64/libXext.so.6.4.0
34d4600000-34d4604000 r-xp 00000000 fd:00 1142
  /lib64/libcap.so.2.16
34d4604000-34d4803000 ---p 00004000 fd:00 1142
  /lib64/libcap.so.2.16
34d4803000-34d4804000 rw-p 00003000 fd:00 1142
  /lib64/libcap.so.2.16
34d4a00000-34d4a33000 r-xp 00000000 fd:00 1112
  /usr/lib64/libfontconfig.so.1.4.1
34d4a33000-34d4c32000 ---p 00033000 fd:00 1112
  /usr/lib64/libfontconfig.so.1.4.1
34d4c32000-34d4c34000 rw-p 00032000 fd:00 1112
  /usr/lib64/libfontconfig.so.1.4.1
34d4e00000-34d4e25000 r-xp 00000000 fd:00 1114
  /usr/lib64/libpng12.so.0.37.0
34d4e25000-34d5024000 ---p 00025000 fd:00 1114
  /usr/lib64/libpng12.so.0.37.0
34d5024000-34d5025000 rw-p 00024000 fd:00 1114
  /usr/lib64/libpng12.so.0.37.0
34d5200000-34d523c000 r-xp 00000000 fd:00 1143
  /lib64/libdbus-1.so.3.4.0
34d523c000-34d543c000 ---p 0003c000 fd:00 1143
  /lib64/libdbus-1.so.3.4.0
34d543c000-34d543d000 r--p 0003c000 fd:00 1143
  /lib64/libdbus-1.so.3.4.0
34d543d000-34d543e000 rw-p 0003d000 fd:00 1143
  /lib64/libdbus-1.so.3.4.0
34d5600000-34d5609000 r-xp 00000000 fd:00 1118
  /usr/lib64/libXrender.so.1.3.0
34d5609000-34d5808000 ---p 00009000 fd:00 1118
  /usr/lib64/libXrender.so.1.3.0
34d5808000-34d5809000 rw-p 00008000 fd:00 1118
  /usr/lib64/libXrender.so.1.3.0
34d5a00000-34d5a2c000 r-xp 00000000 fd:00 1121
  /usr/lib64/libpangoft2-1.0.so.0.2400.5
34d5a2c000-34d5c2b000 ---p 0002c000 fd:00 1121
  /usr/lib64/libpangoft2-1.0.so.0.2400.5
34d5c2b000-34d5c2d000 rw-p 0002b000 fd:00 1121
  /usr/lib64/libpangoft2-1.0.so.0.2400.5
34d5e00000-34d5e46000 r-xp 00000000 fd:00 1120
  /usr/lib64/libpango-1.0.so.0.2400.5
34d5e46000-34d6046000 ---p 00046000 fd:00 1120
  /usr/lib64/libpango-1.0.so.0.2400.5
34d6046000-34d6049000 rw-p 00046000 fd:00 1120
  /usr/lib64/libpango-1.0.so.0.2400.5
34d6200000-34d6209000 r-xp 00000000 fd:00 1128
  /usr/lib64/libXcursor.so.1.0.2
34d6209000-34d6409000 ---p 00009000 fd:00 1128
  /usr/lib64/libXcursor.so.1.0.2
34d6409000-34d640a000 rw-p 00009000 fd:00 1128
  /usr/lib64/libXcursor.so.1.0.2
34d6600000-34d6674000 r-xp 00000000 fd:00 1119
  /usr/lib64/libcairo.so.2.10800.8
34d6674000-34d6873000 ---p 00074000 fd:00 1119
  /usr/lib64/libcairo.so.2.10800.8
34d6873000-34d6876000 rw-p 00073000 fd:00 1119
  /usr/lib64/libcairo.so.2.10800.8
34d6a00000-34d6a02000 r-xp 00000000 fd:00 1129
  /usr/lib64/libXcomposite.so.1.0.0
34d6a02000-34d6c01000 ---p 00002000 fd:00 1129
  /usr/lib64/libXcomposite.so.1.0.0
34d6c01000-34d6c02000 rw-p 00001000 fd:00 1129
  /usr/lib64/libXcomposite.so.1.0.0
34d6e00000-34d6e99000 r-xp 00000000 fd:00 1132
  /usr/lib64/libgdk-x11-2.0.so.0.1600.5
34d6e99000-34d7099000 ---p 00099000 fd:00 1132
  /usr/lib64/libgdk-x11-2.0.so.0.1600.5
34d7099000-34d709e000 rw-p 00099000 fd:00 1132
  /usr/lib64/libgdk-x11-2.0.so.0.1600.5
34d7200000-34d7243000 r-xp 00000000 fd:00 1109
  /usr/lib64/libpixman-1.so.0.14.0
34d7243000-34d7442000 ---p 00043000 fd:00 1109
  /usr/lib64/libpixman-1.so.0.14.0
34d7442000-34d7445000 rw-p 00042000 fd:00 1109
  /usr/lib64/libpixman-1.so.0.14.0
34d7600000-34d761d000 r-xp 00000000 fd:00 1131
  /usr/lib64/libgdk_pixbuf-2.0.so.0.1600.5
34d761d000-34d781c000 ---p 0001d000 fd:00 1131
  /usr/lib64/libgdk_pixbuf-2.0.so.0.1600.5
34d781c000-34d781d000 rw-p 0001c000 fd:00 1131
  /usr/lib64/libgdk_pixbuf-2.0.so.0.1600.5
34d7a00000-34d7a08000 r-xp 00000000 fd:00 1126
  /usr/lib64/libXrandr.so.2.2.0
34d7a08000-34d7c07000 ---p 00008000 fd:00 1126
  /usr/lib64/libXrandr.so.2.2.0
34d7c07000-34d7c08000 rw-p 00007000 fd:00 1126
  /usr/lib64/libXrandr.so.2.2.0
34d7e00000-34d7e02000 r-xp 00000000 fd:00 1130
  /usr/lib64/libXdamage.so.1.1.0
34d7e02000-34d8001000 ---p 00002000 fd:00 1130
  /usr/lib64/libXdamage.so.1.1.0
34d8001000-34d8002000 rw-p 00001000 fd:00 1130
  /usr/lib64/libXdamage.so.1.1.0
34d8200000-34d8209000 r-xp 00000000 fd:00 1125
  /usr/lib64/libXi.so.6.0.0
34d8209000-34d8409000 ---p 00009000 fd:00 1125
  /usr/lib64/libXi.so.6.0.0
34d8409000-34d840a000 rw-p 00009000 fd:00 1125
  /usr/lib64/libXi.so.6.0.0
34d8600000-34d8602000 r-xp 00000000 fd:00 1124
  /usr/lib64/libXinerama.so.1.0.0
34d8602000-34d8801000 ---p 00002000 fd:00 1124
  /usr/lib64/libXinerama.so.1.0.0
34d8801000-34d8802000 rw-p 00001000 fd:00 1124
  /usr/lib64/libXinerama.so.1.0.0
34d8a00000-34d8a05000 r-xp 00000000 fd:00 1127
  /usr/lib64/libXfixes.so.3.1.0
34d8a05000-34d8c04000 ---p 00005000 fd:00 1127
  /usr/lib64/libXfixes.so.3.1.0
34d8c04000-34d8c05000 rw-p 00004000 fd:00 1127
  /usr/lib64/libXfixes.so.3.1.0
34d8e00000-34d91d6000 r-xp 00000000 fd:00 1134
  /usr/lib64/libgtk-x11-2.0.so.0.1600.5
34d91d6000-34d93d5000 ---p 003d6000 fd:00 1134
  /usr/lib64/libgtk-x11-2.0.so.0.1600.5
34d93d5000-34d93e0000 rw-p 003d5000 fd:00 1134
  /usr/lib64/libgtk-x11-2.0.so.0.1600.5
34d93e0000-34d93e2000 rw-p 00000000 00:00 0
34d9400000-34d941d000 r-xp 00000000 fd:00 1133
  /usr/lib64/libatk-1.0.so.0.2511.1
34d941d000-34d961c000 ---p 0001d000 fd:00 1133
  /usr/lib64/libatk-1.0.so.0.2511.1
34d961c000-34d961f000 rw-p 0001c000 fd:00 1133
  /usr/lib64/libatk-1.0.so.0.2511.1
34d9800000-34d980b000 r-xp 00000000 fd:00 1122
  /usr/lib64/libpangocairo-1.0.so.0.2400.5
34d980b000-34d9a0a000 ---p 0000b000 fd:00 1122
  /usr/lib64/libpangocairo-1.0.so.0.2400.5
34d9a0a000-34d9a0b000 rw-p 0000a000 fd:00 1122
  /usr/lib64/libpangocairo-1.0.so.0.2400.5
34d9c00000-34d9c20000 r-xp 00000000 fd:00 1144
  /usr/lib64/libdbus-glib-1.so.2.1.0
34d9c20000-34d9e1f000 ---p 00020000 fd:00 1144
  /usr/lib64/libdbus-glib-1.so.2.1.0
34d9e1f000-34d9e21000 rw-p 0001f000 fd:00 1144
  /usr/lib64/libdbus-glib-1.so.2.1.0
34da000000-34da003000 r-xp 00000000 fd:00 16360
  /lib64/libuuid.so.1.2
34da003000-34da203000 ---p 00003000 fd:00 16360
  /lib64/libuuid.so.1.2
34da203000-34da204000 rw-p 00003000 fd:00 16360
  /lib64/libuuid.so.1.2
34da800000-34da85d000 r-xp 00000000 fd:00 1145
  /usr/lib64/libORBit-2.so.0.1.0
34da85d000-34daa5c000 ---p 0005d000 fd:00 1145
  /usr/lib64/libORBit-2.so.0.1.0
34daa5c000-34daa6f000 rw-p 0005c000 fd:00 1145
  /usr/lib64/libORBit-2.so.0.1.0
34db000000-34db039000 r-xp 00000000 fd:00 1146
  /usr/lib64/libgconf-2.so.4.1.5
34db039000-34db239000 ---p 00039000 fd:00 1146
  /usr/lib64/libgconf-2.so.4.1.5
34db239000-34db23e000 rw-p 00039000 fd:00 1146
  /usr/lib64/libgconf-2.so.4.1.5
34db400000-34db407000 r-xp 00000000 fd:00 16361
  /usr/lib64/libSM.so.6.0.0
34db407000-34db607000 ---p 00007000 fd:00 16361
  /usr/lib64/libSM.so.6.0.0
34db607000-34db608000 rw-p 00007000 fd:00 16361
  /usr/lib64/libSM.so.6.0.0
34db800000-34db817000 r-xp 00000000 fd:00 16359
  /usr/lib64/libICE.so.6.3.0
34db817000-34dba17000 ---p 00017000 fd:00 16359
  /usr/lib64/libICE.so.6.3.0
34dba17000-34dba18000 rw-p 00017000 fd:00 16359
  /usr/lib64/libICE.so.6.3.0
34dba18000-34dba1c000 rw-p 00000000 00:00 0
34dd000000-34dd019000 r-xp 00000000 fd:00 1139
  /lib64/libgcc_s-4.4.1-20090729.so.1
34dd019000-34dd219000 ---p 00019000 fd:00 1139
  /lib64/libgcc_s-4.4.1-20090729.so.1
34dd219000-34dd21a000 rw-p 00019000 fd:00 1139
  /lib64/libgcc_s-4.4.1-20090729.so.1
34e0000000-34e0005000 r-xp 00000000 fd:00 26294
  /usr/lib64/libXtst.so.6.1.0
34e0005000-34e0205000 ---p 00005000 fd:00 26294
  /usr/lib64/libXtst.so.6.1.0
34e0205000-34e0206000 rw-p 00005000 fd:00 26294
  /usr/lib64/libXtst.so.6.1.0
34e5000000-34e5018000 r-xp 00000000 fd:00 29867
  /usr/lib64/libpolkit.so.2.0.0
34e5018000-34e5218000 ---p 00018000 fd:00 29867
  /usr/lib64/libpolkit.so.2.0.0
34e5218000-34e5219000 rw-p 00018000 fd:00 29867
  /usr/lib64/libpolkit.so.2.0.0
34e5800000-34e5805000 r-xp 00000000 fd:00 29887
  /usr/lib64/libogg.so.0.5.3
34e5805000-34e5a04000 ---p 00005000 fd:00 29887
  /usr/lib64/libogg.so.0.5.3
34e5a04000-34e5a05000 rw-p 00004000 fd:00 29887
  /usr/lib64/libogg.so.0.5.3
34e6400000-34e6408000 r-xp 00000000 fd:00 1177
  /usr/lib64/libltdl.so.7.2.0
34e6408000-34e6608000 ---p 00008000 fd:00 1177
  /usr/lib64/libltdl.so.7.2.0
34e6608000-34e6609000 rw-p 00008000 fd:00 1177
  /usr/lib64/libltdl.so.7.2.0
34e7400000-34e740c000 r-xp 00000000 fd:00 29868
  /usr/lib64/libpolkit-dbus.so.2.0.0
34e740c000-34e760b000 ---p 0000c000 fd:00 29868
  /usr/lib64/libpolkit-dbus.so.2.0.0
34e760b000-34e760c000 rw-p 0000b000 fd:00 29868
  /usr/lib64/libpolkit-dbus.so.2.0.0
34e7800000-34e781f000 r-xp 00000000 fd:00 29888
  /usr/lib64/libvorbis.so.0.4.0
34e781f000-34e7a1e000 ---p 0001f000 fd:00 29888
  /usr/lib64/libvorbis.so.0.4.0
34e7a1e000-34e7a2d000 rw-p 0001e000 fd:00 29888
  /usr/lib64/libvorbis.so.0.4.0
34e7c00000-34e7c0a000 r-xp 00000000 fd:00 29869
  /usr/lib64/libpolkit-grant.so.2.0.0
34e7c0a000-34e7e09000 ---p 0000a000 fd:00 29869
  /usr/lib64/libpolkit-grant.so.2.0.0
34e7e09000-34e7e0a000 rw-p 00009000 fd:00 29869
  /usr/lib64/libpolkit-grant.so.2.0.0
34e8000000-34e8003000 r-xp 00000000 fd:00 29892
  /usr/lib64/libcanberra-gtk.so.0.0.5
34e8003000-34e8203000 ---p 00003000 fd:00 29892
  /usr/lib64/libcanberra-gtk.so.0.0.5
34e8203000-34e8204000 rw-p 00003000 fd:00 29892
  /usr/lib64/libcanberra-gtk.so.0.0.5
34e8800000-34e880f000 r-xp 00000000 fd:00 29891
  /usr/lib64/libcanberra.so.0.1.5
34e880f000-34e8a0e000 ---p 0000f000 fd:00 29891
  /usr/lib64/libcanberra.so.0.1.5
34e8a0e000-34e8a0f000 rw-p 0000e000 fd:00 29891
  /usr/lib64/libcanberra.so.0.1.5
34e9000000-34e9007000 r-xp 00000000 fd:00 29889
  /usr/lib64/libvorbisfile.so.3.2.0
34e9007000-34e9206000 ---p 00007000 fd:00 29889
  /usr/lib64/libvorbisfile.so.3.2.0
34e9206000-34e9207000 rw-p 00006000 fd:00 29889
  /usr/lib64/libvorbisfile.so.3.2.0
34e9400000-34e940d000 r-xp 00000000 fd:00 29890
  /usr/lib64/libtdb.so.1.1.5
34e940d000-34e960c000 ---p 0000d000 fd:00 29890
  /usr/lib64/libtdb.so.1.1.5
34e960c000-34e960d000 rw-p 0000c000 fd:00 29890
  /usr/lib64/libtdb.so.1.1.5
34e9c00000-34e9c0a000 r-xp 00000000 fd:00 29870
  /usr/lib64/libpolkit-gnome.so.0.0.0
34e9c0a000-34e9e0a000 ---p 0000a000 fd:00 29870
  /usr/lib64/libpolkit-gnome.so.0.0.0
34e9e0a000-34e9e0b000 rw-p 0000a000 fd:00 29870
  /usr/lib64/libpolkit-gnome.so.0.0.0
3d14400000-3d14541000 r-xp 00000000 fd:00 114
  /usr/lib64/libxml2.so.2.7.6
3d14541000-3d14740000 ---p 00141000 fd:00 114
  /usr/lib64/libxml2.so.2.7.6
3d14740000-3d1474a000 rw-p 00140000 fd:00 114
  /usr/lib64/libxml2.so.2.7.6
3d1474a000-3d1474b000 rw-p 00000000 00:00 0
3d14c00000-3d14c18000 r-xp 00000000 fd:00 48785
  /usr/lib64/libglade-2.0.so.0.0.7
3d14c18000-3d14e17000 ---p 00018000 fd:00 48785
  /usr/lib64/libglade-2.0.so.0.0.7
3d14e17000-3d14e19000 rw-p 00017000 fd:00 48785
  /usr/lib64/libglade-2.0.so.0.0.7
3d16800000-3d168ed000 r-xp 00000000 fd:00 22864
  /usr/lib64/libstdc++.so.6.0.12
3d168ed000-3d16aec000 ---p 000ed000 fd:00 22864
  /usr/lib64/libstdc++.so.6.0.12
3d16aec000-3d16af3000 r--p 000ec000 fd:00 22864
  /usr/lib64/libstdc++.so.6.0.12
3d16af3000-3d16af5000 rw-p 000f3000 fd:00 22864
  /usr/lib64/libstdc++.so.6.0.12
3d16af5000-3d16b0a000 rw-p 00000000 00:00 0
7f05a3fae000-7f05a3fc1000 r-xp 00000000 fd:00 22909
  /usr/lib64/libelf-0.142.so
7f05a3fc1000-7f05a41c0000 ---p 00013000 fd:00 22909
  /usr/lib64/libelf-0.142.so
7f05a41c0000-7f05a41c1000 r--p 00012000 fd:00 22909
  /usr/lib64/libelf-0.142.so
7f05a41c1000-7f05a41c2000 rw-p 00013000 fd:00 22909
  /usr/lib64/libelf-0.142.so
7f05a41d4000-7f05a41d7000 r-xp 00000000 fd:00 116786
  /usr/lib64/gtk-2.0/modules/libgnomebreakpad.so
7f05a41d7000-7f05a43d6000 ---p 00003000 fd:00 116786
  /usr/lib64/gtk-2.0/modules/libgnomebreakpad.so
7f05a43d6000-7f05a43d7000 rw-p 00002000 fd:00 116786
  /usr/lib64/gtk-2.0/modules/libgnomebreakpad.so
7f05a43d7000-7f05a43db000 r-xp 00000000 fd:00 40602
  /usr/lib64/gtk-2.0/modules/libcanberra-gtk-module.so
7f05a43db000-7f05a45db000 ---p 00004000 fd:00 40602
  /usr/lib64/gtk-2.0/modules/libcanberra-gtk-module.so
7f05a45db000-7f05a45dc000 rw-p 00004000 fd:00 40602
  /usr/lib64/gtk-2.0/modules/libcanberra-gtk-module.so
7f05a45dc000-7f05a45df000 r-xp 00000000 fd:00 82244
  /usr/lib64/gtk-2.0/modules/libpk-gtk-module.so
7f05a45df000-7f05a47de000 ---p 00003000 fd:00 82244
  /usr/lib64/gtk-2.0/modules/libpk-gtk-module.so
7f05a47de000-7f05a47df000 rw-p 00002000 fd:00 82244
  /usr/lib64/gtk-2.0/modules/libpk-gtk-module.so
7f05a47df000-7f05a47fb000 r--p 00000000 fd:00 14540
  /usr/share/locale/ja/LC_MESSAGES/libc.mo
7f05a47fb000-7f05a480d000 r-xp 00000000 fd:00 53032
  /usr/lib64/gtk-2.0/2.10.0/engines/libnodoka.so
7f05a480d000-7f05a4a0d000 ---p 00012000 fd:00 53032
  /usr/lib64/gtk-2.0/2.10.0/engines/libnodoka.so
7f05a4a0d000-7f05a4a0e000 rw-p 00012000 fd:00 53032
  /usr/lib64/gtk-2.0/2.10.0/engines/libnodoka.so
7f05a4a0e000-7f05a4a0f000 ---p 00000000 00:00 0
7f05a4a0f000-7f05a520f000 rw-p 00000000 00:00 0
7f05a520f000-7f05a521b000 r--p 00000000 fd:00 21639
  /usr/share/locale/ja/LC_MESSAGES/glib20.mo
7f05a521b000-7f05a5227000 r-xp 00000000 fd:00 12418
  /lib64/libnss_files-2.10.1.so
7f05a5227000-7f05a5426000 ---p 0000c000 fd:00 12418
  /lib64/libnss_files-2.10.1.so
7f05a5426000-7f05a5427000 r--p 0000b000 fd:00 12418
  /lib64/libnss_files-2.10.1.so
7f05a5427000-7f05a5428000 rw-p 0000c000 fd:00 12418
  /lib64/libnss_files-2.10.1.so
7f05a5428000-7f05a543a000 r--p 00000000 fd:00 25291
  /usr/share/locale/ja/LC_MESSAGES/GConf2.mo
7f05a543a000-7f05a544e000 r--p 00000000 fd:00 40242
  /usr/share/locale/ja/LC_MESSAGES/gtk20.mo
7f05a544e000-7f05aa520000 r--p 00000000 fd:00 14558
  /usr/lib/locale/locale-archive
7f05aa520000-7f05aa538000 rw-p 00000000 00:00 0
7f05aa53f000-7f05aa546000 r--s 00000000 fd:00 12712
  /usr/lib64/gconv/gconv-modules.cache
7f05aa546000-7f05aa54a000 r--p 00000000 fd:00 110980
  /usr/share/locale/ja/LC_MESSAGES/gnome-session-2.0.mo
7f05aa54a000-7f05aa54c000 rw-p 00000000 00:00 0
7fff45b42000-7fff45b57000 rw-p 00000000 00:00 0                          [stack]
7fff45be4000-7fff45be5000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
  [vsyscall]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-27  6:10                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 128+ messages in thread
From: KOSAKI Motohiro @ 2009-10-27  6:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: vedran.furac, linux-mm, linux-kernel

2009/10/27 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
> On Mon, 26 Oct 2009 17:16:14 +0100
> Vedran Furač <vedran.furac@gmail.com> wrote:
>> >  - Could you show me /var/log/dmesg and /var/log/messages at OOM ?
>>
>> It was catastrophe. :) X crashed (or killed) with all the programs, but
>> my little program was alive for 20 minutes (see timestamps). And for
>> that time computer was completely unusable. Couldn't even get the
>> console via ssh. Rally embarrassing for a modern OS to get destroyed by
>> a 5 lines of C run as an ordinary user. Luckily screen was still alive,
>> oomk usually kills it also. See for yourself:
>>
>> dmesg: http://pastebin.com/f3f83738a
>> messages: http://pastebin.com/f2091110a
>>
>> (CCing to lklm again... I just want people to see the logs.)
>>
> Thank you for reporting and your patience. It seems something strange
> that your KDE programs are killed. I agree.
>
> I attached a scirpt for checking oom_score of all exisiting process.
> (oom_score is a value used for selecting "bad" processs.")
> please run if you have time.
>
> This is a result of my own desktop(on virtual machine.)
> In this environ (Total memory is 1.6GBytes), mmap(1G) program is running.
>
> %check_badness.pl | sort -n | tail
> --
> 89924   3938    mixer_applet2
> 90210   3942    tomboy
> 94753   3936    clock-applet
> 101994  3919    pulseaudio
> 113525  4028    gnome-terminal
> 127340  1       init
> 128177  3871    nautilus
> 151003  11515   bash
> 256944  11653   mmap
> 425561  3829    gnome-session
> --
> Sigh, gnome-session has twice value of mmap(1G).
> Of course, gnome-session only uses 6M bytes of anon.
> I wonder this is because gnome-session has many children..but need to
> dig more. Does anyone has idea ?
> (CCed kosaki)

Following output address the issue.
The fact is, modern desktop application linked pretty many library. it
makes bloat VSS size and increase
OOM score.

Ideally, We shouldn't account evictable file-backed mappings for oom_score.


# cat /proc/`pidof gnome-session`/maps
00400000-00433000 r-xp 00000000 fd:00 100061
  /usr/bin/gnome-session
00632000-00637000 rw-p 00032000 fd:00 100061
  /usr/bin/gnome-session
00949000-00a10000 rw-p 00000000 00:00 0                                  [heap]
34cf600000-34cf61f000 r-xp 00000000 fd:00 1088
  /lib64/ld-2.10.1.so
34cf81e000-34cf81f000 r--p 0001e000 fd:00 1088
  /lib64/ld-2.10.1.so
34cf81f000-34cf820000 rw-p 0001f000 fd:00 1088
  /lib64/ld-2.10.1.so
34cfa00000-34cfb64000 r-xp 00000000 fd:00 1089
  /lib64/libc-2.10.1.so
34cfb64000-34cfd64000 ---p 00164000 fd:00 1089
  /lib64/libc-2.10.1.so
34cfd64000-34cfd68000 r--p 00164000 fd:00 1089
  /lib64/libc-2.10.1.so
34cfd68000-34cfd69000 rw-p 00168000 fd:00 1089
  /lib64/libc-2.10.1.so
34cfd69000-34cfd6e000 rw-p 00000000 00:00 0
34cfe00000-34cfe82000 r-xp 00000000 fd:00 1104
  /lib64/libm-2.10.1.so
34cfe82000-34d0082000 ---p 00082000 fd:00 1104
  /lib64/libm-2.10.1.so
34d0082000-34d0083000 r--p 00082000 fd:00 1104
  /lib64/libm-2.10.1.so
34d0083000-34d0084000 rw-p 00083000 fd:00 1104
  /lib64/libm-2.10.1.so
34d0200000-34d0202000 r-xp 00000000 fd:00 1095
  /lib64/libdl-2.10.1.so
34d0202000-34d0402000 ---p 00002000 fd:00 1095
  /lib64/libdl-2.10.1.so
34d0402000-34d0403000 r--p 00002000 fd:00 1095
  /lib64/libdl-2.10.1.so
34d0403000-34d0404000 rw-p 00003000 fd:00 1095
  /lib64/libdl-2.10.1.so
34d0600000-34d0617000 r-xp 00000000 fd:00 1090
  /lib64/libpthread-2.10.1.so
34d0617000-34d0816000 ---p 00017000 fd:00 1090
  /lib64/libpthread-2.10.1.so
34d0816000-34d0817000 r--p 00016000 fd:00 1090
  /lib64/libpthread-2.10.1.so
34d0817000-34d0818000 rw-p 00017000 fd:00 1090
  /lib64/libpthread-2.10.1.so
34d0818000-34d081c000 rw-p 00000000 00:00 0
34d0a00000-34d0a15000 r-xp 00000000 fd:00 1113
  /lib64/libz.so.1.2.3
34d0a15000-34d0c14000 ---p 00015000 fd:00 1113
  /lib64/libz.so.1.2.3
34d0c14000-34d0c15000 rw-p 00014000 fd:00 1113
  /lib64/libz.so.1.2.3
34d0e00000-34d0e07000 r-xp 00000000 fd:00 1091
  /lib64/librt-2.10.1.so
34d0e07000-34d1006000 ---p 00007000 fd:00 1091
  /lib64/librt-2.10.1.so
34d1006000-34d1007000 r--p 00006000 fd:00 1091
  /lib64/librt-2.10.1.so
34d1007000-34d1008000 rw-p 00007000 fd:00 1091
  /lib64/librt-2.10.1.so
34d1200000-34d121c000 r-xp 00000000 fd:00 1097
  /lib64/libselinux.so.1
34d121c000-34d141b000 ---p 0001c000 fd:00 1097
  /lib64/libselinux.so.1
34d141b000-34d141c000 r--p 0001b000 fd:00 1097
  /lib64/libselinux.so.1
34d141c000-34d141d000 rw-p 0001c000 fd:00 1097
  /lib64/libselinux.so.1
34d141d000-34d141e000 rw-p 00000000 00:00 0
34d1600000-34d16dd000 r-xp 00000000 fd:00 1092
  /lib64/libglib-2.0.so.0.2000.4
34d16dd000-34d18dc000 ---p 000dd000 fd:00 1092
  /lib64/libglib-2.0.so.0.2000.4
34d18dc000-34d18de000 rw-p 000dc000 fd:00 1092
  /lib64/libglib-2.0.so.0.2000.4
34d1a00000-34d1a41000 r-xp 00000000 fd:00 1094
  /lib64/libgobject-2.0.so.0.2000.4
34d1a41000-34d1c41000 ---p 00041000 fd:00 1094
  /lib64/libgobject-2.0.so.0.2000.4
34d1c41000-34d1c43000 rw-p 00041000 fd:00 1094
  /lib64/libgobject-2.0.so.0.2000.4
34d1e00000-34d1e02000 r-xp 00000000 fd:00 1115
  /usr/lib64/libXau.so.6.0.0
34d1e02000-34d2001000 ---p 00002000 fd:00 1115
  /usr/lib64/libXau.so.6.0.0
34d2001000-34d2002000 rw-p 00001000 fd:00 1115
  /usr/lib64/libXau.so.6.0.0
34d2200000-34d2203000 r-xp 00000000 fd:00 1096
  /lib64/libgmodule-2.0.so.0.2000.4
34d2203000-34d2402000 ---p 00003000 fd:00 1096
  /lib64/libgmodule-2.0.so.0.2000.4
34d2402000-34d2403000 rw-p 00002000 fd:00 1096
  /lib64/libgmodule-2.0.so.0.2000.4
34d2600000-34d261a000 r-xp 00000000 fd:00 1116
  /usr/lib64/libxcb.so.1.1.0
34d261a000-34d281a000 ---p 0001a000 fd:00 1116
  /usr/lib64/libxcb.so.1.1.0
34d281a000-34d281b000 rw-p 0001a000 fd:00 1116
  /usr/lib64/libxcb.so.1.1.0
34d2a00000-34d2b34000 r-xp 00000000 fd:00 1117
  /usr/lib64/libX11.so.6.2.0
34d2b34000-34d2d33000 ---p 00134000 fd:00 1117
  /usr/lib64/libX11.so.6.2.0
34d2d33000-34d2d39000 rw-p 00133000 fd:00 1117
  /usr/lib64/libX11.so.6.2.0
34d2e00000-34d2e04000 r-xp 00000000 fd:00 1093
  /lib64/libgthread-2.0.so.0.2000.4
34d2e04000-34d3003000 ---p 00004000 fd:00 1093
  /lib64/libgthread-2.0.so.0.2000.4
34d3003000-34d3004000 rw-p 00003000 fd:00 1093
  /lib64/libgthread-2.0.so.0.2000.4
34d3200000-34d3226000 r-xp 00000000 fd:00 1111
  /lib64/libexpat.so.1.5.2
34d3226000-34d3425000 ---p 00026000 fd:00 1111
  /lib64/libexpat.so.1.5.2
34d3425000-34d3428000 rw-p 00025000 fd:00 1111
  /lib64/libexpat.so.1.5.2
34d3600000-34d3676000 r-xp 00000000 fd:00 1098
  /lib64/libgio-2.0.so.0.2000.4
34d3676000-34d3875000 ---p 00076000 fd:00 1098
  /lib64/libgio-2.0.so.0.2000.4
34d3875000-34d3877000 rw-p 00075000 fd:00 1098
  /lib64/libgio-2.0.so.0.2000.4
34d3877000-34d3878000 rw-p 00000000 00:00 0
34d3a00000-34d3a93000 r-xp 00000000 fd:00 1110
  /usr/lib64/libfreetype.so.6.3.20
34d3a93000-34d3c93000 ---p 00093000 fd:00 1110
  /usr/lib64/libfreetype.so.6.3.20
34d3c93000-34d3c99000 rw-p 00093000 fd:00 1110
  /usr/lib64/libfreetype.so.6.3.20
34d3e00000-34d3e04000 r-xp 00000000 fd:00 1141
  /lib64/libattr.so.1.1.0
34d3e04000-34d4003000 ---p 00004000 fd:00 1141
  /lib64/libattr.so.1.1.0
34d4003000-34d4004000 rw-p 00003000 fd:00 1141
  /lib64/libattr.so.1.1.0
34d4200000-34d4211000 r-xp 00000000 fd:00 1123
  /usr/lib64/libXext.so.6.4.0
34d4211000-34d4411000 ---p 00011000 fd:00 1123
  /usr/lib64/libXext.so.6.4.0
34d4411000-34d4412000 rw-p 00011000 fd:00 1123
  /usr/lib64/libXext.so.6.4.0
34d4600000-34d4604000 r-xp 00000000 fd:00 1142
  /lib64/libcap.so.2.16
34d4604000-34d4803000 ---p 00004000 fd:00 1142
  /lib64/libcap.so.2.16
34d4803000-34d4804000 rw-p 00003000 fd:00 1142
  /lib64/libcap.so.2.16
34d4a00000-34d4a33000 r-xp 00000000 fd:00 1112
  /usr/lib64/libfontconfig.so.1.4.1
34d4a33000-34d4c32000 ---p 00033000 fd:00 1112
  /usr/lib64/libfontconfig.so.1.4.1
34d4c32000-34d4c34000 rw-p 00032000 fd:00 1112
  /usr/lib64/libfontconfig.so.1.4.1
34d4e00000-34d4e25000 r-xp 00000000 fd:00 1114
  /usr/lib64/libpng12.so.0.37.0
34d4e25000-34d5024000 ---p 00025000 fd:00 1114
  /usr/lib64/libpng12.so.0.37.0
34d5024000-34d5025000 rw-p 00024000 fd:00 1114
  /usr/lib64/libpng12.so.0.37.0
34d5200000-34d523c000 r-xp 00000000 fd:00 1143
  /lib64/libdbus-1.so.3.4.0
34d523c000-34d543c000 ---p 0003c000 fd:00 1143
  /lib64/libdbus-1.so.3.4.0
34d543c000-34d543d000 r--p 0003c000 fd:00 1143
  /lib64/libdbus-1.so.3.4.0
34d543d000-34d543e000 rw-p 0003d000 fd:00 1143
  /lib64/libdbus-1.so.3.4.0
34d5600000-34d5609000 r-xp 00000000 fd:00 1118
  /usr/lib64/libXrender.so.1.3.0
34d5609000-34d5808000 ---p 00009000 fd:00 1118
  /usr/lib64/libXrender.so.1.3.0
34d5808000-34d5809000 rw-p 00008000 fd:00 1118
  /usr/lib64/libXrender.so.1.3.0
34d5a00000-34d5a2c000 r-xp 00000000 fd:00 1121
  /usr/lib64/libpangoft2-1.0.so.0.2400.5
34d5a2c000-34d5c2b000 ---p 0002c000 fd:00 1121
  /usr/lib64/libpangoft2-1.0.so.0.2400.5
34d5c2b000-34d5c2d000 rw-p 0002b000 fd:00 1121
  /usr/lib64/libpangoft2-1.0.so.0.2400.5
34d5e00000-34d5e46000 r-xp 00000000 fd:00 1120
  /usr/lib64/libpango-1.0.so.0.2400.5
34d5e46000-34d6046000 ---p 00046000 fd:00 1120
  /usr/lib64/libpango-1.0.so.0.2400.5
34d6046000-34d6049000 rw-p 00046000 fd:00 1120
  /usr/lib64/libpango-1.0.so.0.2400.5
34d6200000-34d6209000 r-xp 00000000 fd:00 1128
  /usr/lib64/libXcursor.so.1.0.2
34d6209000-34d6409000 ---p 00009000 fd:00 1128
  /usr/lib64/libXcursor.so.1.0.2
34d6409000-34d640a000 rw-p 00009000 fd:00 1128
  /usr/lib64/libXcursor.so.1.0.2
34d6600000-34d6674000 r-xp 00000000 fd:00 1119
  /usr/lib64/libcairo.so.2.10800.8
34d6674000-34d6873000 ---p 00074000 fd:00 1119
  /usr/lib64/libcairo.so.2.10800.8
34d6873000-34d6876000 rw-p 00073000 fd:00 1119
  /usr/lib64/libcairo.so.2.10800.8
34d6a00000-34d6a02000 r-xp 00000000 fd:00 1129
  /usr/lib64/libXcomposite.so.1.0.0
34d6a02000-34d6c01000 ---p 00002000 fd:00 1129
  /usr/lib64/libXcomposite.so.1.0.0
34d6c01000-34d6c02000 rw-p 00001000 fd:00 1129
  /usr/lib64/libXcomposite.so.1.0.0
34d6e00000-34d6e99000 r-xp 00000000 fd:00 1132
  /usr/lib64/libgdk-x11-2.0.so.0.1600.5
34d6e99000-34d7099000 ---p 00099000 fd:00 1132
  /usr/lib64/libgdk-x11-2.0.so.0.1600.5
34d7099000-34d709e000 rw-p 00099000 fd:00 1132
  /usr/lib64/libgdk-x11-2.0.so.0.1600.5
34d7200000-34d7243000 r-xp 00000000 fd:00 1109
  /usr/lib64/libpixman-1.so.0.14.0
34d7243000-34d7442000 ---p 00043000 fd:00 1109
  /usr/lib64/libpixman-1.so.0.14.0
34d7442000-34d7445000 rw-p 00042000 fd:00 1109
  /usr/lib64/libpixman-1.so.0.14.0
34d7600000-34d761d000 r-xp 00000000 fd:00 1131
  /usr/lib64/libgdk_pixbuf-2.0.so.0.1600.5
34d761d000-34d781c000 ---p 0001d000 fd:00 1131
  /usr/lib64/libgdk_pixbuf-2.0.so.0.1600.5
34d781c000-34d781d000 rw-p 0001c000 fd:00 1131
  /usr/lib64/libgdk_pixbuf-2.0.so.0.1600.5
34d7a00000-34d7a08000 r-xp 00000000 fd:00 1126
  /usr/lib64/libXrandr.so.2.2.0
34d7a08000-34d7c07000 ---p 00008000 fd:00 1126
  /usr/lib64/libXrandr.so.2.2.0
34d7c07000-34d7c08000 rw-p 00007000 fd:00 1126
  /usr/lib64/libXrandr.so.2.2.0
34d7e00000-34d7e02000 r-xp 00000000 fd:00 1130
  /usr/lib64/libXdamage.so.1.1.0
34d7e02000-34d8001000 ---p 00002000 fd:00 1130
  /usr/lib64/libXdamage.so.1.1.0
34d8001000-34d8002000 rw-p 00001000 fd:00 1130
  /usr/lib64/libXdamage.so.1.1.0
34d8200000-34d8209000 r-xp 00000000 fd:00 1125
  /usr/lib64/libXi.so.6.0.0
34d8209000-34d8409000 ---p 00009000 fd:00 1125
  /usr/lib64/libXi.so.6.0.0
34d8409000-34d840a000 rw-p 00009000 fd:00 1125
  /usr/lib64/libXi.so.6.0.0
34d8600000-34d8602000 r-xp 00000000 fd:00 1124
  /usr/lib64/libXinerama.so.1.0.0
34d8602000-34d8801000 ---p 00002000 fd:00 1124
  /usr/lib64/libXinerama.so.1.0.0
34d8801000-34d8802000 rw-p 00001000 fd:00 1124
  /usr/lib64/libXinerama.so.1.0.0
34d8a00000-34d8a05000 r-xp 00000000 fd:00 1127
  /usr/lib64/libXfixes.so.3.1.0
34d8a05000-34d8c04000 ---p 00005000 fd:00 1127
  /usr/lib64/libXfixes.so.3.1.0
34d8c04000-34d8c05000 rw-p 00004000 fd:00 1127
  /usr/lib64/libXfixes.so.3.1.0
34d8e00000-34d91d6000 r-xp 00000000 fd:00 1134
  /usr/lib64/libgtk-x11-2.0.so.0.1600.5
34d91d6000-34d93d5000 ---p 003d6000 fd:00 1134
  /usr/lib64/libgtk-x11-2.0.so.0.1600.5
34d93d5000-34d93e0000 rw-p 003d5000 fd:00 1134
  /usr/lib64/libgtk-x11-2.0.so.0.1600.5
34d93e0000-34d93e2000 rw-p 00000000 00:00 0
34d9400000-34d941d000 r-xp 00000000 fd:00 1133
  /usr/lib64/libatk-1.0.so.0.2511.1
34d941d000-34d961c000 ---p 0001d000 fd:00 1133
  /usr/lib64/libatk-1.0.so.0.2511.1
34d961c000-34d961f000 rw-p 0001c000 fd:00 1133
  /usr/lib64/libatk-1.0.so.0.2511.1
34d9800000-34d980b000 r-xp 00000000 fd:00 1122
  /usr/lib64/libpangocairo-1.0.so.0.2400.5
34d980b000-34d9a0a000 ---p 0000b000 fd:00 1122
  /usr/lib64/libpangocairo-1.0.so.0.2400.5
34d9a0a000-34d9a0b000 rw-p 0000a000 fd:00 1122
  /usr/lib64/libpangocairo-1.0.so.0.2400.5
34d9c00000-34d9c20000 r-xp 00000000 fd:00 1144
  /usr/lib64/libdbus-glib-1.so.2.1.0
34d9c20000-34d9e1f000 ---p 00020000 fd:00 1144
  /usr/lib64/libdbus-glib-1.so.2.1.0
34d9e1f000-34d9e21000 rw-p 0001f000 fd:00 1144
  /usr/lib64/libdbus-glib-1.so.2.1.0
34da000000-34da003000 r-xp 00000000 fd:00 16360
  /lib64/libuuid.so.1.2
34da003000-34da203000 ---p 00003000 fd:00 16360
  /lib64/libuuid.so.1.2
34da203000-34da204000 rw-p 00003000 fd:00 16360
  /lib64/libuuid.so.1.2
34da800000-34da85d000 r-xp 00000000 fd:00 1145
  /usr/lib64/libORBit-2.so.0.1.0
34da85d000-34daa5c000 ---p 0005d000 fd:00 1145
  /usr/lib64/libORBit-2.so.0.1.0
34daa5c000-34daa6f000 rw-p 0005c000 fd:00 1145
  /usr/lib64/libORBit-2.so.0.1.0
34db000000-34db039000 r-xp 00000000 fd:00 1146
  /usr/lib64/libgconf-2.so.4.1.5
34db039000-34db239000 ---p 00039000 fd:00 1146
  /usr/lib64/libgconf-2.so.4.1.5
34db239000-34db23e000 rw-p 00039000 fd:00 1146
  /usr/lib64/libgconf-2.so.4.1.5
34db400000-34db407000 r-xp 00000000 fd:00 16361
  /usr/lib64/libSM.so.6.0.0
34db407000-34db607000 ---p 00007000 fd:00 16361
  /usr/lib64/libSM.so.6.0.0
34db607000-34db608000 rw-p 00007000 fd:00 16361
  /usr/lib64/libSM.so.6.0.0
34db800000-34db817000 r-xp 00000000 fd:00 16359
  /usr/lib64/libICE.so.6.3.0
34db817000-34dba17000 ---p 00017000 fd:00 16359
  /usr/lib64/libICE.so.6.3.0
34dba17000-34dba18000 rw-p 00017000 fd:00 16359
  /usr/lib64/libICE.so.6.3.0
34dba18000-34dba1c000 rw-p 00000000 00:00 0
34dd000000-34dd019000 r-xp 00000000 fd:00 1139
  /lib64/libgcc_s-4.4.1-20090729.so.1
34dd019000-34dd219000 ---p 00019000 fd:00 1139
  /lib64/libgcc_s-4.4.1-20090729.so.1
34dd219000-34dd21a000 rw-p 00019000 fd:00 1139
  /lib64/libgcc_s-4.4.1-20090729.so.1
34e0000000-34e0005000 r-xp 00000000 fd:00 26294
  /usr/lib64/libXtst.so.6.1.0
34e0005000-34e0205000 ---p 00005000 fd:00 26294
  /usr/lib64/libXtst.so.6.1.0
34e0205000-34e0206000 rw-p 00005000 fd:00 26294
  /usr/lib64/libXtst.so.6.1.0
34e5000000-34e5018000 r-xp 00000000 fd:00 29867
  /usr/lib64/libpolkit.so.2.0.0
34e5018000-34e5218000 ---p 00018000 fd:00 29867
  /usr/lib64/libpolkit.so.2.0.0
34e5218000-34e5219000 rw-p 00018000 fd:00 29867
  /usr/lib64/libpolkit.so.2.0.0
34e5800000-34e5805000 r-xp 00000000 fd:00 29887
  /usr/lib64/libogg.so.0.5.3
34e5805000-34e5a04000 ---p 00005000 fd:00 29887
  /usr/lib64/libogg.so.0.5.3
34e5a04000-34e5a05000 rw-p 00004000 fd:00 29887
  /usr/lib64/libogg.so.0.5.3
34e6400000-34e6408000 r-xp 00000000 fd:00 1177
  /usr/lib64/libltdl.so.7.2.0
34e6408000-34e6608000 ---p 00008000 fd:00 1177
  /usr/lib64/libltdl.so.7.2.0
34e6608000-34e6609000 rw-p 00008000 fd:00 1177
  /usr/lib64/libltdl.so.7.2.0
34e7400000-34e740c000 r-xp 00000000 fd:00 29868
  /usr/lib64/libpolkit-dbus.so.2.0.0
34e740c000-34e760b000 ---p 0000c000 fd:00 29868
  /usr/lib64/libpolkit-dbus.so.2.0.0
34e760b000-34e760c000 rw-p 0000b000 fd:00 29868
  /usr/lib64/libpolkit-dbus.so.2.0.0
34e7800000-34e781f000 r-xp 00000000 fd:00 29888
  /usr/lib64/libvorbis.so.0.4.0
34e781f000-34e7a1e000 ---p 0001f000 fd:00 29888
  /usr/lib64/libvorbis.so.0.4.0
34e7a1e000-34e7a2d000 rw-p 0001e000 fd:00 29888
  /usr/lib64/libvorbis.so.0.4.0
34e7c00000-34e7c0a000 r-xp 00000000 fd:00 29869
  /usr/lib64/libpolkit-grant.so.2.0.0
34e7c0a000-34e7e09000 ---p 0000a000 fd:00 29869
  /usr/lib64/libpolkit-grant.so.2.0.0
34e7e09000-34e7e0a000 rw-p 00009000 fd:00 29869
  /usr/lib64/libpolkit-grant.so.2.0.0
34e8000000-34e8003000 r-xp 00000000 fd:00 29892
  /usr/lib64/libcanberra-gtk.so.0.0.5
34e8003000-34e8203000 ---p 00003000 fd:00 29892
  /usr/lib64/libcanberra-gtk.so.0.0.5
34e8203000-34e8204000 rw-p 00003000 fd:00 29892
  /usr/lib64/libcanberra-gtk.so.0.0.5
34e8800000-34e880f000 r-xp 00000000 fd:00 29891
  /usr/lib64/libcanberra.so.0.1.5
34e880f000-34e8a0e000 ---p 0000f000 fd:00 29891
  /usr/lib64/libcanberra.so.0.1.5
34e8a0e000-34e8a0f000 rw-p 0000e000 fd:00 29891
  /usr/lib64/libcanberra.so.0.1.5
34e9000000-34e9007000 r-xp 00000000 fd:00 29889
  /usr/lib64/libvorbisfile.so.3.2.0
34e9007000-34e9206000 ---p 00007000 fd:00 29889
  /usr/lib64/libvorbisfile.so.3.2.0
34e9206000-34e9207000 rw-p 00006000 fd:00 29889
  /usr/lib64/libvorbisfile.so.3.2.0
34e9400000-34e940d000 r-xp 00000000 fd:00 29890
  /usr/lib64/libtdb.so.1.1.5
34e940d000-34e960c000 ---p 0000d000 fd:00 29890
  /usr/lib64/libtdb.so.1.1.5
34e960c000-34e960d000 rw-p 0000c000 fd:00 29890
  /usr/lib64/libtdb.so.1.1.5
34e9c00000-34e9c0a000 r-xp 00000000 fd:00 29870
  /usr/lib64/libpolkit-gnome.so.0.0.0
34e9c0a000-34e9e0a000 ---p 0000a000 fd:00 29870
  /usr/lib64/libpolkit-gnome.so.0.0.0
34e9e0a000-34e9e0b000 rw-p 0000a000 fd:00 29870
  /usr/lib64/libpolkit-gnome.so.0.0.0
3d14400000-3d14541000 r-xp 00000000 fd:00 114
  /usr/lib64/libxml2.so.2.7.6
3d14541000-3d14740000 ---p 00141000 fd:00 114
  /usr/lib64/libxml2.so.2.7.6
3d14740000-3d1474a000 rw-p 00140000 fd:00 114
  /usr/lib64/libxml2.so.2.7.6
3d1474a000-3d1474b000 rw-p 00000000 00:00 0
3d14c00000-3d14c18000 r-xp 00000000 fd:00 48785
  /usr/lib64/libglade-2.0.so.0.0.7
3d14c18000-3d14e17000 ---p 00018000 fd:00 48785
  /usr/lib64/libglade-2.0.so.0.0.7
3d14e17000-3d14e19000 rw-p 00017000 fd:00 48785
  /usr/lib64/libglade-2.0.so.0.0.7
3d16800000-3d168ed000 r-xp 00000000 fd:00 22864
  /usr/lib64/libstdc++.so.6.0.12
3d168ed000-3d16aec000 ---p 000ed000 fd:00 22864
  /usr/lib64/libstdc++.so.6.0.12
3d16aec000-3d16af3000 r--p 000ec000 fd:00 22864
  /usr/lib64/libstdc++.so.6.0.12
3d16af3000-3d16af5000 rw-p 000f3000 fd:00 22864
  /usr/lib64/libstdc++.so.6.0.12
3d16af5000-3d16b0a000 rw-p 00000000 00:00 0
7f05a3fae000-7f05a3fc1000 r-xp 00000000 fd:00 22909
  /usr/lib64/libelf-0.142.so
7f05a3fc1000-7f05a41c0000 ---p 00013000 fd:00 22909
  /usr/lib64/libelf-0.142.so
7f05a41c0000-7f05a41c1000 r--p 00012000 fd:00 22909
  /usr/lib64/libelf-0.142.so
7f05a41c1000-7f05a41c2000 rw-p 00013000 fd:00 22909
  /usr/lib64/libelf-0.142.so
7f05a41d4000-7f05a41d7000 r-xp 00000000 fd:00 116786
  /usr/lib64/gtk-2.0/modules/libgnomebreakpad.so
7f05a41d7000-7f05a43d6000 ---p 00003000 fd:00 116786
  /usr/lib64/gtk-2.0/modules/libgnomebreakpad.so
7f05a43d6000-7f05a43d7000 rw-p 00002000 fd:00 116786
  /usr/lib64/gtk-2.0/modules/libgnomebreakpad.so
7f05a43d7000-7f05a43db000 r-xp 00000000 fd:00 40602
  /usr/lib64/gtk-2.0/modules/libcanberra-gtk-module.so
7f05a43db000-7f05a45db000 ---p 00004000 fd:00 40602
  /usr/lib64/gtk-2.0/modules/libcanberra-gtk-module.so
7f05a45db000-7f05a45dc000 rw-p 00004000 fd:00 40602
  /usr/lib64/gtk-2.0/modules/libcanberra-gtk-module.so
7f05a45dc000-7f05a45df000 r-xp 00000000 fd:00 82244
  /usr/lib64/gtk-2.0/modules/libpk-gtk-module.so
7f05a45df000-7f05a47de000 ---p 00003000 fd:00 82244
  /usr/lib64/gtk-2.0/modules/libpk-gtk-module.so
7f05a47de000-7f05a47df000 rw-p 00002000 fd:00 82244
  /usr/lib64/gtk-2.0/modules/libpk-gtk-module.so
7f05a47df000-7f05a47fb000 r--p 00000000 fd:00 14540
  /usr/share/locale/ja/LC_MESSAGES/libc.mo
7f05a47fb000-7f05a480d000 r-xp 00000000 fd:00 53032
  /usr/lib64/gtk-2.0/2.10.0/engines/libnodoka.so
7f05a480d000-7f05a4a0d000 ---p 00012000 fd:00 53032
  /usr/lib64/gtk-2.0/2.10.0/engines/libnodoka.so
7f05a4a0d000-7f05a4a0e000 rw-p 00012000 fd:00 53032
  /usr/lib64/gtk-2.0/2.10.0/engines/libnodoka.so
7f05a4a0e000-7f05a4a0f000 ---p 00000000 00:00 0
7f05a4a0f000-7f05a520f000 rw-p 00000000 00:00 0
7f05a520f000-7f05a521b000 r--p 00000000 fd:00 21639
  /usr/share/locale/ja/LC_MESSAGES/glib20.mo
7f05a521b000-7f05a5227000 r-xp 00000000 fd:00 12418
  /lib64/libnss_files-2.10.1.so
7f05a5227000-7f05a5426000 ---p 0000c000 fd:00 12418
  /lib64/libnss_files-2.10.1.so
7f05a5426000-7f05a5427000 r--p 0000b000 fd:00 12418
  /lib64/libnss_files-2.10.1.so
7f05a5427000-7f05a5428000 rw-p 0000c000 fd:00 12418
  /lib64/libnss_files-2.10.1.so
7f05a5428000-7f05a543a000 r--p 00000000 fd:00 25291
  /usr/share/locale/ja/LC_MESSAGES/GConf2.mo
7f05a543a000-7f05a544e000 r--p 00000000 fd:00 40242
  /usr/share/locale/ja/LC_MESSAGES/gtk20.mo
7f05a544e000-7f05aa520000 r--p 00000000 fd:00 14558
  /usr/lib/locale/locale-archive
7f05aa520000-7f05aa538000 rw-p 00000000 00:00 0
7f05aa53f000-7f05aa546000 r--s 00000000 fd:00 12712
  /usr/lib64/gconv/gconv-modules.cache
7f05aa546000-7f05aa54a000 r--p 00000000 fd:00 110980
  /usr/share/locale/ja/LC_MESSAGES/gnome-session-2.0.mo
7f05aa54a000-7f05aa54c000 rw-p 00000000 00:00 0
7fff45b42000-7fff45b57000 rw-p 00000000 00:00 0                          [stack]
7fff45be4000-7fff45be5000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
  [vsyscall]

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-27  6:10                 ` KOSAKI Motohiro
@ 2009-10-27  6:34                   ` Minchan Kim
  -1 siblings, 0 replies; 128+ messages in thread
From: Minchan Kim @ 2009-10-27  6:34 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: KAMEZAWA Hiroyuki, vedran.furac, linux-mm, linux-kernel

On Tue, 27 Oct 2009 15:10:52 +0900
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> 2009/10/27 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
> > On Mon, 26 Oct 2009 17:16:14 +0100
> > Vedran Furač <vedran.furac@gmail.com> wrote:
> >> >  - Could you show me /var/log/dmesg and /var/log/messages at OOM ?
> >>
> >> It was catastrophe. :) X crashed (or killed) with all the programs, but
> >> my little program was alive for 20 minutes (see timestamps). And for
> >> that time computer was completely unusable. Couldn't even get the
> >> console via ssh. Rally embarrassing for a modern OS to get destroyed by
> >> a 5 lines of C run as an ordinary user. Luckily screen was still alive,
> >> oomk usually kills it also. See for yourself:
> >>
> >> dmesg: http://pastebin.com/f3f83738a
> >> messages: http://pastebin.com/f2091110a
> >>
> >> (CCing to lklm again... I just want people to see the logs.)
> >>
> > Thank you for reporting and your patience. It seems something strange
> > that your KDE programs are killed. I agree.
> >
> > I attached a scirpt for checking oom_score of all exisiting process.
> > (oom_score is a value used for selecting "bad" processs.")
> > please run if you have time.
> >
> > This is a result of my own desktop(on virtual machine.)
> > In this environ (Total memory is 1.6GBytes), mmap(1G) program is running.
> >
> > %check_badness.pl | sort -n | tail
> > --
> > 89924   3938    mixer_applet2
> > 90210   3942    tomboy
> > 94753   3936    clock-applet
> > 101994  3919    pulseaudio
> > 113525  4028    gnome-terminal
> > 127340  1       init
> > 128177  3871    nautilus
> > 151003  11515   bash
> > 256944  11653   mmap
> > 425561  3829    gnome-session
> > --
> > Sigh, gnome-session has twice value of mmap(1G).
> > Of course, gnome-session only uses 6M bytes of anon.
> > I wonder this is because gnome-session has many children..but need to
> > dig more. Does anyone has idea ?
> > (CCed kosaki)
> 
> Following output address the issue.
> The fact is, modern desktop application linked pretty many library. it
> makes bloat VSS size and increase
> OOM score.
> 
> Ideally, We shouldn't account evictable file-backed mappings for oom_score.
> 
Hmm. 
I wonder why we consider VM size for OOM kiling. 
How about RSS size?


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-27  6:34                   ` Minchan Kim
  0 siblings, 0 replies; 128+ messages in thread
From: Minchan Kim @ 2009-10-27  6:34 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: KAMEZAWA Hiroyuki, vedran.furac, linux-mm, linux-kernel

On Tue, 27 Oct 2009 15:10:52 +0900
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> 2009/10/27 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
> > On Mon, 26 Oct 2009 17:16:14 +0100
> > Vedran FuraA? <vedran.furac@gmail.com> wrote:
> >> > A - Could you show me /var/log/dmesg and /var/log/messages at OOM ?
> >>
> >> It was catastrophe. :) X crashed (or killed) with all the programs, but
> >> my little program was alive for 20 minutes (see timestamps). And for
> >> that time computer was completely unusable. Couldn't even get the
> >> console via ssh. Rally embarrassing for a modern OS to get destroyed by
> >> a 5 lines of C run as an ordinary user. Luckily screen was still alive,
> >> oomk usually kills it also. See for yourself:
> >>
> >> dmesg: http://pastebin.com/f3f83738a
> >> messages: http://pastebin.com/f2091110a
> >>
> >> (CCing to lklm again... I just want people to see the logs.)
> >>
> > Thank you for reporting and your patience. It seems something strange
> > that your KDE programs are killed. I agree.
> >
> > I attached a scirpt for checking oom_score of all exisiting process.
> > (oom_score is a value used for selecting "bad" processs.")
> > please run if you have time.
> >
> > This is a result of my own desktop(on virtual machine.)
> > In this environ (Total memory is 1.6GBytes), mmap(1G) program is running.
> >
> > %check_badness.pl | sort -n | tail
> > --
> > 89924 A  3938 A  A mixer_applet2
> > 90210 A  3942 A  A tomboy
> > 94753 A  3936 A  A clock-applet
> > 101994 A 3919 A  A pulseaudio
> > 113525 A 4028 A  A gnome-terminal
> > 127340 A 1 A  A  A  init
> > 128177 A 3871 A  A nautilus
> > 151003 A 11515 A  bash
> > 256944 A 11653 A  mmap
> > 425561 A 3829 A  A gnome-session
> > --
> > Sigh, gnome-session has twice value of mmap(1G).
> > Of course, gnome-session only uses 6M bytes of anon.
> > I wonder this is because gnome-session has many children..but need to
> > dig more. Does anyone has idea ?
> > (CCed kosaki)
> 
> Following output address the issue.
> The fact is, modern desktop application linked pretty many library. it
> makes bloat VSS size and increase
> OOM score.
> 
> Ideally, We shouldn't account evictable file-backed mappings for oom_score.
> 
Hmm. 
I wonder why we consider VM size for OOM kiling. 
How about RSS size?


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-27  6:34                   ` Minchan Kim
@ 2009-10-27  6:36                     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-27  6:36 UTC (permalink / raw)
  To: Minchan Kim; +Cc: KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel

On Tue, 27 Oct 2009 15:34:29 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Tue, 27 Oct 2009 15:10:52 +0900
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > 2009/10/27 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
> > > On Mon, 26 Oct 2009 17:16:14 +0100
> > > Vedran Furač <vedran.furac@gmail.com> wrote:
> > >> >  - Could you show me /var/log/dmesg and /var/log/messages at OOM ?
> > >>
> > >> It was catastrophe. :) X crashed (or killed) with all the programs, but
> > >> my little program was alive for 20 minutes (see timestamps). And for
> > >> that time computer was completely unusable. Couldn't even get the
> > >> console via ssh. Rally embarrassing for a modern OS to get destroyed by
> > >> a 5 lines of C run as an ordinary user. Luckily screen was still alive,
> > >> oomk usually kills it also. See for yourself:
> > >>
> > >> dmesg: http://pastebin.com/f3f83738a
> > >> messages: http://pastebin.com/f2091110a
> > >>
> > >> (CCing to lklm again... I just want people to see the logs.)
> > >>
> > > Thank you for reporting and your patience. It seems something strange
> > > that your KDE programs are killed. I agree.
> > >
> > > I attached a scirpt for checking oom_score of all exisiting process.
> > > (oom_score is a value used for selecting "bad" processs.")
> > > please run if you have time.
> > >
> > > This is a result of my own desktop(on virtual machine.)
> > > In this environ (Total memory is 1.6GBytes), mmap(1G) program is running.
> > >
> > > %check_badness.pl | sort -n | tail
> > > --
> > > 89924   3938    mixer_applet2
> > > 90210   3942    tomboy
> > > 94753   3936    clock-applet
> > > 101994  3919    pulseaudio
> > > 113525  4028    gnome-terminal
> > > 127340  1       init
> > > 128177  3871    nautilus
> > > 151003  11515   bash
> > > 256944  11653   mmap
> > > 425561  3829    gnome-session
> > > --
> > > Sigh, gnome-session has twice value of mmap(1G).
> > > Of course, gnome-session only uses 6M bytes of anon.
> > > I wonder this is because gnome-session has many children..but need to
> > > dig more. Does anyone has idea ?
> > > (CCed kosaki)
> > 
> > Following output address the issue.
> > The fact is, modern desktop application linked pretty many library. it
> > makes bloat VSS size and increase
> > OOM score.
> > 
> > Ideally, We shouldn't account evictable file-backed mappings for oom_score.
> > 
> Hmm. 
> I wonder why we consider VM size for OOM kiling. 
> How about RSS size?
> 

Maybe the current code assumes "Tons of swap have been generated, already" if
oom-kill is invoked. Then, just using mm->anon_rss will not be correct.

Hm, should we count # of swap entries reference from mm ?....

Regards,
-Kame



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-27  6:36                     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-27  6:36 UTC (permalink / raw)
  To: Minchan Kim; +Cc: KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel

On Tue, 27 Oct 2009 15:34:29 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Tue, 27 Oct 2009 15:10:52 +0900
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > 2009/10/27 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
> > > On Mon, 26 Oct 2009 17:16:14 +0100
> > > Vedran FuraA? <vedran.furac@gmail.com> wrote:
> > >> > A - Could you show me /var/log/dmesg and /var/log/messages at OOM ?
> > >>
> > >> It was catastrophe. :) X crashed (or killed) with all the programs, but
> > >> my little program was alive for 20 minutes (see timestamps). And for
> > >> that time computer was completely unusable. Couldn't even get the
> > >> console via ssh. Rally embarrassing for a modern OS to get destroyed by
> > >> a 5 lines of C run as an ordinary user. Luckily screen was still alive,
> > >> oomk usually kills it also. See for yourself:
> > >>
> > >> dmesg: http://pastebin.com/f3f83738a
> > >> messages: http://pastebin.com/f2091110a
> > >>
> > >> (CCing to lklm again... I just want people to see the logs.)
> > >>
> > > Thank you for reporting and your patience. It seems something strange
> > > that your KDE programs are killed. I agree.
> > >
> > > I attached a scirpt for checking oom_score of all exisiting process.
> > > (oom_score is a value used for selecting "bad" processs.")
> > > please run if you have time.
> > >
> > > This is a result of my own desktop(on virtual machine.)
> > > In this environ (Total memory is 1.6GBytes), mmap(1G) program is running.
> > >
> > > %check_badness.pl | sort -n | tail
> > > --
> > > 89924 A  3938 A  A mixer_applet2
> > > 90210 A  3942 A  A tomboy
> > > 94753 A  3936 A  A clock-applet
> > > 101994 A 3919 A  A pulseaudio
> > > 113525 A 4028 A  A gnome-terminal
> > > 127340 A 1 A  A  A  init
> > > 128177 A 3871 A  A nautilus
> > > 151003 A 11515 A  bash
> > > 256944 A 11653 A  mmap
> > > 425561 A 3829 A  A gnome-session
> > > --
> > > Sigh, gnome-session has twice value of mmap(1G).
> > > Of course, gnome-session only uses 6M bytes of anon.
> > > I wonder this is because gnome-session has many children..but need to
> > > dig more. Does anyone has idea ?
> > > (CCed kosaki)
> > 
> > Following output address the issue.
> > The fact is, modern desktop application linked pretty many library. it
> > makes bloat VSS size and increase
> > OOM score.
> > 
> > Ideally, We shouldn't account evictable file-backed mappings for oom_score.
> > 
> Hmm. 
> I wonder why we consider VM size for OOM kiling. 
> How about RSS size?
> 

Maybe the current code assumes "Tons of swap have been generated, already" if
oom-kill is invoked. Then, just using mm->anon_rss will not be correct.

Hm, should we count # of swap entries reference from mm ?....

Regards,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-27  6:36                     ` KAMEZAWA Hiroyuki
@ 2009-10-27  6:55                       ` Minchan Kim
  -1 siblings, 0 replies; 128+ messages in thread
From: Minchan Kim @ 2009-10-27  6:55 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel

On Tue, Oct 27, 2009 at 3:36 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 27 Oct 2009 15:34:29 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> On Tue, 27 Oct 2009 15:10:52 +0900
>> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>>
>> > 2009/10/27 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
>> > > On Mon, 26 Oct 2009 17:16:14 +0100
>> > > Vedran Furač <vedran.furac@gmail.com> wrote:
>> > >> >  - Could you show me /var/log/dmesg and /var/log/messages at OOM ?
>> > >>
>> > >> It was catastrophe. :) X crashed (or killed) with all the programs, but
>> > >> my little program was alive for 20 minutes (see timestamps). And for
>> > >> that time computer was completely unusable. Couldn't even get the
>> > >> console via ssh. Rally embarrassing for a modern OS to get destroyed by
>> > >> a 5 lines of C run as an ordinary user. Luckily screen was still alive,
>> > >> oomk usually kills it also. See for yourself:
>> > >>
>> > >> dmesg: http://pastebin.com/f3f83738a
>> > >> messages: http://pastebin.com/f2091110a
>> > >>
>> > >> (CCing to lklm again... I just want people to see the logs.)
>> > >>
>> > > Thank you for reporting and your patience. It seems something strange
>> > > that your KDE programs are killed. I agree.
>> > >
>> > > I attached a scirpt for checking oom_score of all exisiting process.
>> > > (oom_score is a value used for selecting "bad" processs.")
>> > > please run if you have time.
>> > >
>> > > This is a result of my own desktop(on virtual machine.)
>> > > In this environ (Total memory is 1.6GBytes), mmap(1G) program is running.
>> > >
>> > > %check_badness.pl | sort -n | tail
>> > > --
>> > > 89924   3938    mixer_applet2
>> > > 90210   3942    tomboy
>> > > 94753   3936    clock-applet
>> > > 101994  3919    pulseaudio
>> > > 113525  4028    gnome-terminal
>> > > 127340  1       init
>> > > 128177  3871    nautilus
>> > > 151003  11515   bash
>> > > 256944  11653   mmap
>> > > 425561  3829    gnome-session
>> > > --
>> > > Sigh, gnome-session has twice value of mmap(1G).
>> > > Of course, gnome-session only uses 6M bytes of anon.
>> > > I wonder this is because gnome-session has many children..but need to
>> > > dig more. Does anyone has idea ?
>> > > (CCed kosaki)
>> >
>> > Following output address the issue.
>> > The fact is, modern desktop application linked pretty many library. it
>> > makes bloat VSS size and increase
>> > OOM score.
>> >
>> > Ideally, We shouldn't account evictable file-backed mappings for oom_score.
>> >
>> Hmm.
>> I wonder why we consider VM size for OOM kiling.
>> How about RSS size?
>>
>
> Maybe the current code assumes "Tons of swap have been generated, already" if
> oom-kill is invoked. Then, just using mm->anon_rss will not be correct.
>
> Hm, should we count # of swap entries reference from mm ?....

In Vedran case, he didn't use swap. So, Only considering vm is the problem.
I think it would be better to consider both RSS + # of swap entries as
Kosaki mentioned.


>
> Regards,
> -Kame
>
>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-27  6:55                       ` Minchan Kim
  0 siblings, 0 replies; 128+ messages in thread
From: Minchan Kim @ 2009-10-27  6:55 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel

On Tue, Oct 27, 2009 at 3:36 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 27 Oct 2009 15:34:29 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> On Tue, 27 Oct 2009 15:10:52 +0900
>> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>>
>> > 2009/10/27 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
>> > > On Mon, 26 Oct 2009 17:16:14 +0100
>> > > Vedran Furač <vedran.furac@gmail.com> wrote:
>> > >> >  - Could you show me /var/log/dmesg and /var/log/messages at OOM ?
>> > >>
>> > >> It was catastrophe. :) X crashed (or killed) with all the programs, but
>> > >> my little program was alive for 20 minutes (see timestamps). And for
>> > >> that time computer was completely unusable. Couldn't even get the
>> > >> console via ssh. Rally embarrassing for a modern OS to get destroyed by
>> > >> a 5 lines of C run as an ordinary user. Luckily screen was still alive,
>> > >> oomk usually kills it also. See for yourself:
>> > >>
>> > >> dmesg: http://pastebin.com/f3f83738a
>> > >> messages: http://pastebin.com/f2091110a
>> > >>
>> > >> (CCing to lklm again... I just want people to see the logs.)
>> > >>
>> > > Thank you for reporting and your patience. It seems something strange
>> > > that your KDE programs are killed. I agree.
>> > >
>> > > I attached a scirpt for checking oom_score of all exisiting process.
>> > > (oom_score is a value used for selecting "bad" processs.")
>> > > please run if you have time.
>> > >
>> > > This is a result of my own desktop(on virtual machine.)
>> > > In this environ (Total memory is 1.6GBytes), mmap(1G) program is running.
>> > >
>> > > %check_badness.pl | sort -n | tail
>> > > --
>> > > 89924   3938    mixer_applet2
>> > > 90210   3942    tomboy
>> > > 94753   3936    clock-applet
>> > > 101994  3919    pulseaudio
>> > > 113525  4028    gnome-terminal
>> > > 127340  1       init
>> > > 128177  3871    nautilus
>> > > 151003  11515   bash
>> > > 256944  11653   mmap
>> > > 425561  3829    gnome-session
>> > > --
>> > > Sigh, gnome-session has twice value of mmap(1G).
>> > > Of course, gnome-session only uses 6M bytes of anon.
>> > > I wonder this is because gnome-session has many children..but need to
>> > > dig more. Does anyone has idea ?
>> > > (CCed kosaki)
>> >
>> > Following output address the issue.
>> > The fact is, modern desktop application linked pretty many library. it
>> > makes bloat VSS size and increase
>> > OOM score.
>> >
>> > Ideally, We shouldn't account evictable file-backed mappings for oom_score.
>> >
>> Hmm.
>> I wonder why we consider VM size for OOM kiling.
>> How about RSS size?
>>
>
> Maybe the current code assumes "Tons of swap have been generated, already" if
> oom-kill is invoked. Then, just using mm->anon_rss will not be correct.
>
> Hm, should we count # of swap entries reference from mm ?....

In Vedran case, he didn't use swap. So, Only considering vm is the problem.
I think it would be better to consider both RSS + # of swap entries as
Kosaki mentioned.


>
> Regards,
> -Kame
>
>
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-27  6:34                   ` Minchan Kim
@ 2009-10-27  6:46                     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 128+ messages in thread
From: KOSAKI Motohiro @ 2009-10-27  6:46 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, vedran.furac, linux-mm,
	linux-kernel

> > > %check_badness.pl | sort -n | tail
> > > --
> > > 89924   3938    mixer_applet2
> > > 90210   3942    tomboy
> > > 94753   3936    clock-applet
> > > 101994  3919    pulseaudio
> > > 113525  4028    gnome-terminal
> > > 127340  1       init
> > > 128177  3871    nautilus
> > > 151003  11515   bash
> > > 256944  11653   mmap
> > > 425561  3829    gnome-session
> > > --
> > > Sigh, gnome-session has twice value of mmap(1G).
> > > Of course, gnome-session only uses 6M bytes of anon.
> > > I wonder this is because gnome-session has many children..but need to
> > > dig more. Does anyone has idea ?
> > > (CCed kosaki)
> > 
> > Following output address the issue.
> > The fact is, modern desktop application linked pretty many library. it
> > makes bloat VSS size and increase
> > OOM score.
> > 
> > Ideally, We shouldn't account evictable file-backed mappings for oom_score.
> > 
> Hmm. 
> I wonder why we consider VM size for OOM kiling. 
> How about RSS size?

Because, swap out-ed bad body (e.g. fork bomb process) still should
be killed by oom.
RSS + swap-entries is acceptable to me.




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-27  6:46                     ` KOSAKI Motohiro
  0 siblings, 0 replies; 128+ messages in thread
From: KOSAKI Motohiro @ 2009-10-27  6:46 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, vedran.furac, linux-mm,
	linux-kernel

> > > %check_badness.pl | sort -n | tail
> > > --
> > > 89924 A  3938 A  A mixer_applet2
> > > 90210 A  3942 A  A tomboy
> > > 94753 A  3936 A  A clock-applet
> > > 101994 A 3919 A  A pulseaudio
> > > 113525 A 4028 A  A gnome-terminal
> > > 127340 A 1 A  A  A  init
> > > 128177 A 3871 A  A nautilus
> > > 151003 A 11515 A  bash
> > > 256944 A 11653 A  mmap
> > > 425561 A 3829 A  A gnome-session
> > > --
> > > Sigh, gnome-session has twice value of mmap(1G).
> > > Of course, gnome-session only uses 6M bytes of anon.
> > > I wonder this is because gnome-session has many children..but need to
> > > dig more. Does anyone has idea ?
> > > (CCed kosaki)
> > 
> > Following output address the issue.
> > The fact is, modern desktop application linked pretty many library. it
> > makes bloat VSS size and increase
> > OOM score.
> > 
> > Ideally, We shouldn't account evictable file-backed mappings for oom_score.
> > 
> Hmm. 
> I wonder why we consider VM size for OOM kiling. 
> How about RSS size?

Because, swap out-ed bad body (e.g. fork bomb process) still should
be killed by oom.
RSS + swap-entries is acceptable to me.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-27  6:46                     ` KOSAKI Motohiro
@ 2009-10-27  6:56                       ` Minchan Kim
  -1 siblings, 0 replies; 128+ messages in thread
From: Minchan Kim @ 2009-10-27  6:56 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Minchan Kim, KAMEZAWA Hiroyuki, vedran.furac, linux-mm,
	linux-kernel

On Tue, 27 Oct 2009 15:46:36 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> > > > %check_badness.pl | sort -n | tail
> > > > --
> > > > 89924   3938    mixer_applet2
> > > > 90210   3942    tomboy
> > > > 94753   3936    clock-applet
> > > > 101994  3919    pulseaudio
> > > > 113525  4028    gnome-terminal
> > > > 127340  1       init
> > > > 128177  3871    nautilus
> > > > 151003  11515   bash
> > > > 256944  11653   mmap
> > > > 425561  3829    gnome-session
> > > > --
> > > > Sigh, gnome-session has twice value of mmap(1G).
> > > > Of course, gnome-session only uses 6M bytes of anon.
> > > > I wonder this is because gnome-session has many children..but need to
> > > > dig more. Does anyone has idea ?
> > > > (CCed kosaki)
> > > 
> > > Following output address the issue.
> > > The fact is, modern desktop application linked pretty many library. it
> > > makes bloat VSS size and increase
> > > OOM score.
> > > 
> > > Ideally, We shouldn't account evictable file-backed mappings for oom_score.
> > > 
> > Hmm. 
> > I wonder why we consider VM size for OOM kiling. 
> > How about RSS size?
> 
> Because, swap out-ed bad body (e.g. fork bomb process) still should
> be killed by oom.
> RSS + swap-entries is acceptable to me.

It's reasonable to me.
As I mentioned by reply of kame, in Vedran case, he didn't use swap. 
I think only considering vm is the problem.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-27  6:56                       ` Minchan Kim
  0 siblings, 0 replies; 128+ messages in thread
From: Minchan Kim @ 2009-10-27  6:56 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Minchan Kim, KAMEZAWA Hiroyuki, vedran.furac, linux-mm,
	linux-kernel

On Tue, 27 Oct 2009 15:46:36 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> > > > %check_badness.pl | sort -n | tail
> > > > --
> > > > 89924 A  3938 A  A mixer_applet2
> > > > 90210 A  3942 A  A tomboy
> > > > 94753 A  3936 A  A clock-applet
> > > > 101994 A 3919 A  A pulseaudio
> > > > 113525 A 4028 A  A gnome-terminal
> > > > 127340 A 1 A  A  A  init
> > > > 128177 A 3871 A  A nautilus
> > > > 151003 A 11515 A  bash
> > > > 256944 A 11653 A  mmap
> > > > 425561 A 3829 A  A gnome-session
> > > > --
> > > > Sigh, gnome-session has twice value of mmap(1G).
> > > > Of course, gnome-session only uses 6M bytes of anon.
> > > > I wonder this is because gnome-session has many children..but need to
> > > > dig more. Does anyone has idea ?
> > > > (CCed kosaki)
> > > 
> > > Following output address the issue.
> > > The fact is, modern desktop application linked pretty many library. it
> > > makes bloat VSS size and increase
> > > OOM score.
> > > 
> > > Ideally, We shouldn't account evictable file-backed mappings for oom_score.
> > > 
> > Hmm. 
> > I wonder why we consider VM size for OOM kiling. 
> > How about RSS size?
> 
> Because, swap out-ed bad body (e.g. fork bomb process) still should
> be killed by oom.
> RSS + swap-entries is acceptable to me.

It's reasonable to me.
As I mentioned by reply of kame, in Vedran case, he didn't use swap. 
I think only considering vm is the problem.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-27  3:22               ` KAMEZAWA Hiroyuki
@ 2009-10-27 17:12                 ` Vedran Furač
  -1 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-27 17:12 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, kosaki.motohiro@jp.fujitsu.com,
	hugh.dickins, akpm, rientjes

KAMEZAWA Hiroyuki wrote:

> On Mon, 26 Oct 2009 17:16:14 +0100
> Vedran Furač <vedran.furac@gmail.com> wrote:
>>>  - Could you show me /var/log/dmesg and /var/log/messages at OOM ?
>> It was catastrophe. :) X crashed (or killed) with all the programs, but
>> my little program was alive for 20 minutes (see timestamps). And for
>> that time computer was completely unusable. Couldn't even get the
>> console via ssh. Rally embarrassing for a modern OS to get destroyed by
>> a 5 lines of C run as an ordinary user. Luckily screen was still alive,
>> oomk usually kills it also. See for yourself:
>>
>> dmesg: http://pastebin.com/f3f83738a
>> messages: http://pastebin.com/f2091110a
>>
>> (CCing to lklm again... I just want people to see the logs.)
>>
> Thank you for reporting and your patience. It seems something strange
> that your KDE programs are killed. I agree.

No problem. I want this to be solved as much as you do. Actually, it is
not strange, just a buggy algorithm.

Run:

% ps -T -eo pid,ppid,tid,vsz,command

You'll see that ppid of a number of processes is kdeinit, gnome-session,
fvwm or something else depending on what one is using. All of this
processes are started automatically during startup or manually clicking
on a menu item or by some keyboard shortcut. OOM algorithm just sums
memory usage of all of them and adds that ot the parent. Just plain wrong.

Also, it seems it's looking at VIRT instead of RES.

> I attached a scirpt for checking oom_score of all exisiting process.
> (oom_score is a value used for selecting "bad" processs.")
> please run if you have time.

96890   21463   VirtualBox // OK
118615  11144   kded4 // WRONG
127455  11158   knotify4 // WRONG
132198  1       init // WRONG
133940  11151   ksmserver // WRONG
134109  11224   audacious2 // Audio player, maybe
145476  21503   VirtualBox // OK
174939  11322   icedove-bin // thunderbird, maybe
178015  11223   akregator // rss reader, maybe
201043  22672   krusader  // WRONG
212609  11187   krunner // WRONG
256911  24252   test // culprit, malloced 1GB
1750371 11318   run-mozilla.sh // tiny, parent of firefox threads
2044902 11141   kdeinit4 // tiny, parent of most KDE apps

> Sigh, gnome-session has twice value of mmap(1G).
> Of course, gnome-session only uses 6M bytes of anon.
> I wonder this is because gnome-session has many children..but need to

Yes it is.

Regards,

Vedran

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-27 17:12                 ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-27 17:12 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, kosaki.motohiro@jp.fujitsu.com,
	hugh.dickins, akpm, rientjes

KAMEZAWA Hiroyuki wrote:

> On Mon, 26 Oct 2009 17:16:14 +0100
> Vedran FuraA? <vedran.furac@gmail.com> wrote:
>>>  - Could you show me /var/log/dmesg and /var/log/messages at OOM ?
>> It was catastrophe. :) X crashed (or killed) with all the programs, but
>> my little program was alive for 20 minutes (see timestamps). And for
>> that time computer was completely unusable. Couldn't even get the
>> console via ssh. Rally embarrassing for a modern OS to get destroyed by
>> a 5 lines of C run as an ordinary user. Luckily screen was still alive,
>> oomk usually kills it also. See for yourself:
>>
>> dmesg: http://pastebin.com/f3f83738a
>> messages: http://pastebin.com/f2091110a
>>
>> (CCing to lklm again... I just want people to see the logs.)
>>
> Thank you for reporting and your patience. It seems something strange
> that your KDE programs are killed. I agree.

No problem. I want this to be solved as much as you do. Actually, it is
not strange, just a buggy algorithm.

Run:

% ps -T -eo pid,ppid,tid,vsz,command

You'll see that ppid of a number of processes is kdeinit, gnome-session,
fvwm or something else depending on what one is using. All of this
processes are started automatically during startup or manually clicking
on a menu item or by some keyboard shortcut. OOM algorithm just sums
memory usage of all of them and adds that ot the parent. Just plain wrong.

Also, it seems it's looking at VIRT instead of RES.

> I attached a scirpt for checking oom_score of all exisiting process.
> (oom_score is a value used for selecting "bad" processs.")
> please run if you have time.

96890   21463   VirtualBox // OK
118615  11144   kded4 // WRONG
127455  11158   knotify4 // WRONG
132198  1       init // WRONG
133940  11151   ksmserver // WRONG
134109  11224   audacious2 // Audio player, maybe
145476  21503   VirtualBox // OK
174939  11322   icedove-bin // thunderbird, maybe
178015  11223   akregator // rss reader, maybe
201043  22672   krusader  // WRONG
212609  11187   krunner // WRONG
256911  24252   test // culprit, malloced 1GB
1750371 11318   run-mozilla.sh // tiny, parent of firefox threads
2044902 11141   kdeinit4 // tiny, parent of most KDE apps

> Sigh, gnome-session has twice value of mmap(1G).
> Of course, gnome-session only uses 6M bytes of anon.
> I wonder this is because gnome-session has many children..but need to

Yes it is.

Regards,

Vedran

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-27 17:12                 ` Vedran Furač
  (?)
@ 2009-10-27 18:02                 ` KOSAKI Motohiro
  2009-10-27 18:30                     ` Vedran Furač
  -1 siblings, 1 reply; 128+ messages in thread
From: KOSAKI Motohiro @ 2009-10-27 18:02 UTC (permalink / raw)
  To: vedran.furac
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, hugh.dickins, akpm,
	rientjes

[-- Attachment #1: Type: text/plain, Size: 1098 bytes --]

>> I attached a scirpt for checking oom_score of all exisiting process.
>> (oom_score is a value used for selecting "bad" processs.")
>> please run if you have time.
>
> 96890   21463   VirtualBox // OK
> 118615  11144   kded4 // WRONG
> 127455  11158   knotify4 // WRONG
> 132198  1       init // WRONG
> 133940  11151   ksmserver // WRONG
> 134109  11224   audacious2 // Audio player, maybe
> 145476  21503   VirtualBox // OK
> 174939  11322   icedove-bin // thunderbird, maybe
> 178015  11223   akregator // rss reader, maybe
> 201043  22672   krusader  // WRONG
> 212609  11187   krunner // WRONG
> 256911  24252   test // culprit, malloced 1GB
> 1750371 11318   run-mozilla.sh // tiny, parent of firefox threads
> 2044902 11141   kdeinit4 // tiny, parent of most KDE apps

Verdran, I made alternative improvement idea. Can you please mesure
badness score
on your system?
Maybe your culprit process take biggest badness value.

Note: this patch change time related thing. So, please drink a cup of
coffee before mesurement.
small rest time makes correct test result.

[-- Attachment #2: 0001-oom-oom-score-bonus-by-run_time-use-proportional-va.patch --]
[-- Type: application/octet-stream, Size: 3025 bytes --]

From 047e6647f580a7c9bed2ac547bc9b15154d5da4c Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Wed, 28 Oct 2009 02:25:01 +0900
Subject: [PATCH] oom: oom-score bonus by run_time use proportional value

Currently, oom-score bonus by run_time use the fomula of "sqrt(sqrt(runtime / 1024)))".
It mean process got 1/3 times oom-score per day. This feature exist for protect sevaral
important system daemon.

However, typical desktop user reboot the system everyday. then its bonus is too small.
This bonus only works well on server systems. IOW typical uptime strongly depend on
use-case. it shouldn't use for oom modifier.

Instead, This patch use proportional run_time value against uptime.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/proc/base.c |    1 +
 mm/oom_kill.c  |   26 +++++++++++++++-----------
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 837469a..17d6fd4 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -446,6 +446,7 @@ static int proc_oom_score(struct task_struct *task, char *buffer)
 	struct timespec uptime;
 
 	do_posix_clock_monotonic_gettime(&uptime);
+	monotonic_to_bootbased(&uptime);
 	read_lock(&tasklist_lock);
 	points = badness(task->group_leader, uptime.tv_sec);
 	read_unlock(&tasklist_lock);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ea2147d..3c1b3a3 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -69,10 +69,10 @@ static int has_intersects_mems_allowed(struct task_struct *tsk)
  *    algorithm has been meticulously tuned to meet the principle
  *    of least surprise ... (be careful when you change it)
  */
-
 unsigned long badness(struct task_struct *p, unsigned long uptime)
 {
-	unsigned long points, cpu_time, run_time;
+	unsigned long points, cpu_time;
+	unsigned long run_time = 0;
 	struct mm_struct *mm;
 	struct task_struct *child;
 	int oom_adj = p->signal->oom_adj;
@@ -130,17 +130,20 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
 	utime = cputime_to_jiffies(task_time.utime);
 	stime = cputime_to_jiffies(task_time.stime);
 	cpu_time = (utime + stime) >> (SHIFT_HZ + 3);
-
-
-	if (uptime >= p->start_time.tv_sec)
-		run_time = (uptime - p->start_time.tv_sec) >> 10;
-	else
-		run_time = 0;
-
 	if (cpu_time)
 		points /= int_sqrt(cpu_time);
-	if (run_time)
-		points /= int_sqrt(int_sqrt(run_time));
+
+	if (uptime <= p->real_start_time.tv_sec) {
+		/* Baby process may be not so important. */
+		points *= 2;
+	} else {
+		run_time = (uptime - p->real_start_time.tv_sec);
+		if (!run_time)
+			run_time = 1;
+
+		run_time = ((run_time * 100) / uptime) + 1;
+		points /= int_sqrt(run_time);
+	}
 
 	/*
 	 * Niced processes are most likely less important, so double
@@ -233,6 +236,7 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
 	*ppoints = 0;
 
 	do_posix_clock_monotonic_gettime(&uptime);
+	monotonic_to_bootbased(&uptime);
 	for_each_process(p) {
 		unsigned long points;
 
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-27 18:02                 ` KOSAKI Motohiro
@ 2009-10-27 18:30                     ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-27 18:30 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, hugh.dickins, akpm,
	rientjes

KOSAKI Motohiro wrote:

>>> I attached a scirpt for checking oom_score of all exisiting process.
>>> (oom_score is a value used for selecting "bad" processs.")
>>> please run if you have time.
>> 96890   21463   VirtualBox // OK
>> 118615  11144   kded4 // WRONG
>> 127455  11158   knotify4 // WRONG
>> 132198  1       init // WRONG
>> 133940  11151   ksmserver // WRONG
>> 134109  11224   audacious2 // Audio player, maybe
>> 145476  21503   VirtualBox // OK
>> 174939  11322   icedove-bin // thunderbird, maybe
>> 178015  11223   akregator // rss reader, maybe
>> 201043  22672   krusader  // WRONG
>> 212609  11187   krunner // WRONG
>> 256911  24252   test // culprit, malloced 1GB
>> 1750371 11318   run-mozilla.sh // tiny, parent of firefox threads
>> 2044902 11141   kdeinit4 // tiny, parent of most KDE apps
> 
> Verdran, I made alternative improvement idea. Can you please mesure
> badness score
> on your system?
> Maybe your culprit process take biggest badness value.

Thanks, I'll test it during the week. But note that not every user
reboots its computer everyday. I, for example, usually have it up for
days. And when it comes to my laptop - weeks, as I just suspend it when
I don't use it. Maybe the best way is to combine two patches. Also, you
and others could also test these patches. It is not only my kernel that
behaves strange. :)

> Note: this patch change time related thing. So, please drink a cup of
> coffee before mesurement.
> small rest time makes correct test result.

OK. :)

Regards,

Vedran


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-27 18:30                     ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-27 18:30 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, hugh.dickins, akpm,
	rientjes

KOSAKI Motohiro wrote:

>>> I attached a scirpt for checking oom_score of all exisiting process.
>>> (oom_score is a value used for selecting "bad" processs.")
>>> please run if you have time.
>> 96890   21463   VirtualBox // OK
>> 118615  11144   kded4 // WRONG
>> 127455  11158   knotify4 // WRONG
>> 132198  1       init // WRONG
>> 133940  11151   ksmserver // WRONG
>> 134109  11224   audacious2 // Audio player, maybe
>> 145476  21503   VirtualBox // OK
>> 174939  11322   icedove-bin // thunderbird, maybe
>> 178015  11223   akregator // rss reader, maybe
>> 201043  22672   krusader  // WRONG
>> 212609  11187   krunner // WRONG
>> 256911  24252   test // culprit, malloced 1GB
>> 1750371 11318   run-mozilla.sh // tiny, parent of firefox threads
>> 2044902 11141   kdeinit4 // tiny, parent of most KDE apps
> 
> Verdran, I made alternative improvement idea. Can you please mesure
> badness score
> on your system?
> Maybe your culprit process take biggest badness value.

Thanks, I'll test it during the week. But note that not every user
reboots its computer everyday. I, for example, usually have it up for
days. And when it comes to my laptop - weeks, as I just suspend it when
I don't use it. Maybe the best way is to combine two patches. Also, you
and others could also test these patches. It is not only my kernel that
behaves strange. :)

> Note: this patch change time related thing. So, please drink a cup of
> coffee before mesurement.
> small rest time makes correct test result.

OK. :)

Regards,

Vedran

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-27  3:22               ` KAMEZAWA Hiroyuki
@ 2009-10-27 20:44                 ` Hugh Dickins
  -1 siblings, 0 replies; 128+ messages in thread
From: Hugh Dickins @ 2009-10-27 20:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, linux-mm, linux-kernel, kosaki.motohiro,
	minchan.kim, akpm, rientjes, aarcange

On Tue, 27 Oct 2009, KAMEZAWA Hiroyuki wrote:
> Sigh, gnome-session has twice value of mmap(1G).
> Of course, gnome-session only uses 6M bytes of anon.
> I wonder this is because gnome-session has many children..but need to
> dig more. Does anyone has idea ?

When preparing KSM unmerge to handle OOM, I looked at how the precedent
was handled by running a little program which mmaps an anonymous region
of the same size as physical memory, then tries to mlock it.  The
program was such an obvious candidate to be killed, I was shocked
by the poor decisions the OOM killer made.  Usually I ran it with
mem=512M, with gnome and firefox active.  Often the OOM killer killed
it right the first time, but went wrong when I tried it a second time
(I think that's because of what's already swapped out the first time).

I built up a patchset of fixes, but once I came to split them up for
submission, not one of them seemed entirely satisfactory; and Andrea's
fix to the KSM/mlock deadlock forced me to abandon even the first of
the patches (we've since then fixed the way munlocking behaves, so
in theory could revisit that; but Andrea disliked what I was trying
to do there in KSM for other reasons, so I've not touched it since).
I had to get on with KSM, so I set it all aside: none of the issues
was a recent regression.

I did briefly wonder about the reliance on total_vm which you're now
looking into, but didn't touch that at all.  Let me describe those
issues which I did try but fail to fix - I've no more time to deal
with them now than then, but ought at least to mention them to you.

1.  select_bad_process() tries to avoid killing another process while
there's still a TIF_MEMDIE, but its loop starts by skipping !p->mm
processes.  However, p->mm is set to NULL well before p reaches
exit_mmap() to actually free the memory, and there may be significant
delays in between (I think exit_robust_list() gave me a hang at one
stage).  So in practice, even when the OOM killer selects the right
process to kill, there can be lots of collateral damage from it not
waiting long enough for that process to give up its memory.

I tried to deal with that by moving the TIF_MEMDIE test up before
the p->mm test, but adding in a check on p->exit_state:
		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
		    !p->exit_state)
			return ERR_PTR(-1UL);
But this is then liable to hang the system if there's some reason
why the selected process cannot proceed to free its memory (e.g.
the current KSM unmerge case).  It needs to wait "a while", but
give up if no progress is made, instead of hanging: originally
I thought that setting PF_MEMALLOC more widely in page_alloc.c,
and giving up on the TIF_MEMDIE if it was waiting in PF_MEMALLOC,
would deal with that; but we cannot be sure that waiting of memory
is the only reason for a holdup there (in the KSM unmerge case it's
waiting for an mmap_sem, and there may well be other such cases).

2.  I started out running my mlock test program as root (later
switched to use "ulimit -l unlimited" first).  But badness() reckons
CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points;
and CAP_SYS_RAWIO another reason to quarter your points: so running
as root makes you sixteen times less likely to be killed.  Quartering
is anyway debatable, but sixteenthing seems utterly excessive to me.

I moved the CAP_SYS_RAWIO test in with the others, so it does no
more than quartering; but is quartering appropriate anyway?  I did
wonder if I was right to be "subverting" the fine-grained CAPs in
this way, but have since seen unrelated mail from one who knows
better, implying they're something of a fantasy, that su and sudo
are indeed what's used in the real world.  Maybe this patch was okay.

3.  badness() has a comment above it which says:  
 * 5) we try to kill the process the user expects us to kill, this
 *    algorithm has been meticulously tuned to meet the principle
 *    of least surprise ... (be careful when you change it)
But Andrea's 2.6.11 86a4c6d9e2e43796bb362debd3f73c0e3b198efa (later
refined by Kurt's 2.6.16 9827b781f20828e5ceb911b879f268f78fe90815)
adds plenty of surprise there, by trying to factor children into the
calculation.  Intended to deal with forkbombs, but any reasonable
process whose purpose is to fork children (e.g. gnome-session)
becomes very vulnerable.  And whereas badness() itself goes on to
refine the total_vm points by various adjustments peculiar to the
process in question, those refinements have been ignored when
adding the child's total_vm/2.  (Andrea does remark that he'd
rather have rewritten badness() from scratch.)

I tried to fix this by moving the PF_OOM_ORIGIN (was PF_SWAPOFF)
part of the calculation up to select_bad_process(), making a
solo_badness() function which makes all those adjustments to
total_vm, then badness() itself a simple function adding half
the children's solo_badness()es to the process' own solo_badness().
But probably lots more needs doing - Andrea's rewrite?

4.  In some cases those children are sharing exactly the same mm,
yet its total_vm is being added again and again to the points:
I had a nasty inner loop searching back to see if we'd already
counted this mm (but then, what if the different tasks sharing
the mm deserved different adjustments to the total_vm?).

I hope these notes help someone towards a better solution
(and be prepared to discover more on the way).  I agree with
Vedran that the present behaviour is pretty unimpressive, and
I'm puzzled as to how people can have been tinkering with
oom_kill.c down the years without seeing any of this.

Hugh

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-27 20:44                 ` Hugh Dickins
  0 siblings, 0 replies; 128+ messages in thread
From: Hugh Dickins @ 2009-10-27 20:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, linux-mm, linux-kernel, kosaki.motohiro,
	minchan.kim, akpm, rientjes, aarcange

On Tue, 27 Oct 2009, KAMEZAWA Hiroyuki wrote:
> Sigh, gnome-session has twice value of mmap(1G).
> Of course, gnome-session only uses 6M bytes of anon.
> I wonder this is because gnome-session has many children..but need to
> dig more. Does anyone has idea ?

When preparing KSM unmerge to handle OOM, I looked at how the precedent
was handled by running a little program which mmaps an anonymous region
of the same size as physical memory, then tries to mlock it.  The
program was such an obvious candidate to be killed, I was shocked
by the poor decisions the OOM killer made.  Usually I ran it with
mem=512M, with gnome and firefox active.  Often the OOM killer killed
it right the first time, but went wrong when I tried it a second time
(I think that's because of what's already swapped out the first time).

I built up a patchset of fixes, but once I came to split them up for
submission, not one of them seemed entirely satisfactory; and Andrea's
fix to the KSM/mlock deadlock forced me to abandon even the first of
the patches (we've since then fixed the way munlocking behaves, so
in theory could revisit that; but Andrea disliked what I was trying
to do there in KSM for other reasons, so I've not touched it since).
I had to get on with KSM, so I set it all aside: none of the issues
was a recent regression.

I did briefly wonder about the reliance on total_vm which you're now
looking into, but didn't touch that at all.  Let me describe those
issues which I did try but fail to fix - I've no more time to deal
with them now than then, but ought at least to mention them to you.

1.  select_bad_process() tries to avoid killing another process while
there's still a TIF_MEMDIE, but its loop starts by skipping !p->mm
processes.  However, p->mm is set to NULL well before p reaches
exit_mmap() to actually free the memory, and there may be significant
delays in between (I think exit_robust_list() gave me a hang at one
stage).  So in practice, even when the OOM killer selects the right
process to kill, there can be lots of collateral damage from it not
waiting long enough for that process to give up its memory.

I tried to deal with that by moving the TIF_MEMDIE test up before
the p->mm test, but adding in a check on p->exit_state:
		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
		    !p->exit_state)
			return ERR_PTR(-1UL);
But this is then liable to hang the system if there's some reason
why the selected process cannot proceed to free its memory (e.g.
the current KSM unmerge case).  It needs to wait "a while", but
give up if no progress is made, instead of hanging: originally
I thought that setting PF_MEMALLOC more widely in page_alloc.c,
and giving up on the TIF_MEMDIE if it was waiting in PF_MEMALLOC,
would deal with that; but we cannot be sure that waiting of memory
is the only reason for a holdup there (in the KSM unmerge case it's
waiting for an mmap_sem, and there may well be other such cases).

2.  I started out running my mlock test program as root (later
switched to use "ulimit -l unlimited" first).  But badness() reckons
CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points;
and CAP_SYS_RAWIO another reason to quarter your points: so running
as root makes you sixteen times less likely to be killed.  Quartering
is anyway debatable, but sixteenthing seems utterly excessive to me.

I moved the CAP_SYS_RAWIO test in with the others, so it does no
more than quartering; but is quartering appropriate anyway?  I did
wonder if I was right to be "subverting" the fine-grained CAPs in
this way, but have since seen unrelated mail from one who knows
better, implying they're something of a fantasy, that su and sudo
are indeed what's used in the real world.  Maybe this patch was okay.

3.  badness() has a comment above it which says:  
 * 5) we try to kill the process the user expects us to kill, this
 *    algorithm has been meticulously tuned to meet the principle
 *    of least surprise ... (be careful when you change it)
But Andrea's 2.6.11 86a4c6d9e2e43796bb362debd3f73c0e3b198efa (later
refined by Kurt's 2.6.16 9827b781f20828e5ceb911b879f268f78fe90815)
adds plenty of surprise there, by trying to factor children into the
calculation.  Intended to deal with forkbombs, but any reasonable
process whose purpose is to fork children (e.g. gnome-session)
becomes very vulnerable.  And whereas badness() itself goes on to
refine the total_vm points by various adjustments peculiar to the
process in question, those refinements have been ignored when
adding the child's total_vm/2.  (Andrea does remark that he'd
rather have rewritten badness() from scratch.)

I tried to fix this by moving the PF_OOM_ORIGIN (was PF_SWAPOFF)
part of the calculation up to select_bad_process(), making a
solo_badness() function which makes all those adjustments to
total_vm, then badness() itself a simple function adding half
the children's solo_badness()es to the process' own solo_badness().
But probably lots more needs doing - Andrea's rewrite?

4.  In some cases those children are sharing exactly the same mm,
yet its total_vm is being added again and again to the points:
I had a nasty inner loop searching back to see if we'd already
counted this mm (but then, what if the different tasks sharing
the mm deserved different adjustments to the total_vm?).

I hope these notes help someone towards a better solution
(and be prepared to discover more on the way).  I agree with
Vedran that the present behaviour is pretty unimpressive, and
I'm puzzled as to how people can have been tinkering with
oom_kill.c down the years without seeing any of this.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-27 20:44                 ` Hugh Dickins
@ 2009-10-27 21:04                   ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-27 21:04 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KAMEZAWA Hiroyuki, vedran.furac, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 27 Oct 2009, Hugh Dickins wrote:

> When preparing KSM unmerge to handle OOM, I looked at how the precedent
> was handled by running a little program which mmaps an anonymous region
> of the same size as physical memory, then tries to mlock it.  The
> program was such an obvious candidate to be killed, I was shocked
> by the poor decisions the OOM killer made.  Usually I ran it with
> mem=512M, with gnome and firefox active.  Often the OOM killer killed
> it right the first time, but went wrong when I tried it a second time
> (I think that's because of what's already swapped out the first time).
> 

The heuristics that the oom killer use in selecting a task seem to get 
debated quite often.

What hasn't been mentioned is that total_vm does do a good job of 
identifying tasks that are using far more memory than expected.  That 
seems to be the initial target: killing a rogue task that is hogging much 
more memory than it should, probably because of a memory leak.

The latest approach seems to be focused more on killing the task that will 
free the most resident memory.  That certainly is understandable to avoid 
killing additional tasks later and avoiding subsequent page allocations in 
the short term, but doesn't help to kill the memory leaker.

There's advantages to either approach, but it depends on the contextual 
goal of the oom killer when it's called: kill a rogue task that is 
allocating more memory than expected, or kill a task that will free the 
most memory.

> 1.  select_bad_process() tries to avoid killing another process while
> there's still a TIF_MEMDIE, but its loop starts by skipping !p->mm
> processes.  However, p->mm is set to NULL well before p reaches
> exit_mmap() to actually free the memory, and there may be significant
> delays in between (I think exit_robust_list() gave me a hang at one
> stage).  So in practice, even when the OOM killer selects the right
> process to kill, there can be lots of collateral damage from it not
> waiting long enough for that process to give up its memory.
> 
> I tried to deal with that by moving the TIF_MEMDIE test up before
> the p->mm test, but adding in a check on p->exit_state:
> 		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
> 		    !p->exit_state)
> 			return ERR_PTR(-1UL);
> But this is then liable to hang the system if there's some reason
> why the selected process cannot proceed to free its memory (e.g.
> the current KSM unmerge case).  It needs to wait "a while", but
> give up if no progress is made, instead of hanging: originally
> I thought that setting PF_MEMALLOC more widely in page_alloc.c,
> and giving up on the TIF_MEMDIE if it was waiting in PF_MEMALLOC,
> would deal with that; but we cannot be sure that waiting of memory
> is the only reason for a holdup there (in the KSM unmerge case it's
> waiting for an mmap_sem, and there may well be other such cases).
> 

I've proposed an oom killer timeout in the past which adds a jiffies count 
to struct task_struct and will defer killing other tasks until the 
predefined time limit (we use 10*HZ) has been exceeded.  The problem is 
that even if you kill another task, it is highly unlikely that the expired 
task will ever exit at that point and is still holding a substantial 
amount of memory since it also had access to memory reserves and has still 
failed to exit.

> 2.  I started out running my mlock test program as root (later
> switched to use "ulimit -l unlimited" first).  But badness() reckons
> CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points;
> and CAP_SYS_RAWIO another reason to quarter your points: so running
> as root makes you sixteen times less likely to be killed.  Quartering
> is anyway debatable, but sixteenthing seems utterly excessive to me.
> 
> I moved the CAP_SYS_RAWIO test in with the others, so it does no
> more than quartering; but is quartering appropriate anyway?  I did
> wonder if I was right to be "subverting" the fine-grained CAPs in
> this way, but have since seen unrelated mail from one who knows
> better, implying they're something of a fantasy, that su and sudo
> are indeed what's used in the real world.  Maybe this patch was okay.
> 

I think someone (Nick?) proposed a patch at one time that removed most of 
the heuristics from select_bad_process() other than total_vm of the task 
and its children, mems_allowed intersection, and oom_adj.

> 4.  In some cases those children are sharing exactly the same mm,
> yet its total_vm is being added again and again to the points:
> I had a nasty inner loop searching back to see if we'd already
> counted this mm (but then, what if the different tasks sharing
> the mm deserved different adjustments to the total_vm?).
> 

oom_kill_process() may not kill the task selected by select_bad_process(), 
it will first attempt to kill one of these children with a different mm.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-27 21:04                   ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-27 21:04 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KAMEZAWA Hiroyuki, vedran.furac, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 27 Oct 2009, Hugh Dickins wrote:

> When preparing KSM unmerge to handle OOM, I looked at how the precedent
> was handled by running a little program which mmaps an anonymous region
> of the same size as physical memory, then tries to mlock it.  The
> program was such an obvious candidate to be killed, I was shocked
> by the poor decisions the OOM killer made.  Usually I ran it with
> mem=512M, with gnome and firefox active.  Often the OOM killer killed
> it right the first time, but went wrong when I tried it a second time
> (I think that's because of what's already swapped out the first time).
> 

The heuristics that the oom killer use in selecting a task seem to get 
debated quite often.

What hasn't been mentioned is that total_vm does do a good job of 
identifying tasks that are using far more memory than expected.  That 
seems to be the initial target: killing a rogue task that is hogging much 
more memory than it should, probably because of a memory leak.

The latest approach seems to be focused more on killing the task that will 
free the most resident memory.  That certainly is understandable to avoid 
killing additional tasks later and avoiding subsequent page allocations in 
the short term, but doesn't help to kill the memory leaker.

There's advantages to either approach, but it depends on the contextual 
goal of the oom killer when it's called: kill a rogue task that is 
allocating more memory than expected, or kill a task that will free the 
most memory.

> 1.  select_bad_process() tries to avoid killing another process while
> there's still a TIF_MEMDIE, but its loop starts by skipping !p->mm
> processes.  However, p->mm is set to NULL well before p reaches
> exit_mmap() to actually free the memory, and there may be significant
> delays in between (I think exit_robust_list() gave me a hang at one
> stage).  So in practice, even when the OOM killer selects the right
> process to kill, there can be lots of collateral damage from it not
> waiting long enough for that process to give up its memory.
> 
> I tried to deal with that by moving the TIF_MEMDIE test up before
> the p->mm test, but adding in a check on p->exit_state:
> 		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
> 		    !p->exit_state)
> 			return ERR_PTR(-1UL);
> But this is then liable to hang the system if there's some reason
> why the selected process cannot proceed to free its memory (e.g.
> the current KSM unmerge case).  It needs to wait "a while", but
> give up if no progress is made, instead of hanging: originally
> I thought that setting PF_MEMALLOC more widely in page_alloc.c,
> and giving up on the TIF_MEMDIE if it was waiting in PF_MEMALLOC,
> would deal with that; but we cannot be sure that waiting of memory
> is the only reason for a holdup there (in the KSM unmerge case it's
> waiting for an mmap_sem, and there may well be other such cases).
> 

I've proposed an oom killer timeout in the past which adds a jiffies count 
to struct task_struct and will defer killing other tasks until the 
predefined time limit (we use 10*HZ) has been exceeded.  The problem is 
that even if you kill another task, it is highly unlikely that the expired 
task will ever exit at that point and is still holding a substantial 
amount of memory since it also had access to memory reserves and has still 
failed to exit.

> 2.  I started out running my mlock test program as root (later
> switched to use "ulimit -l unlimited" first).  But badness() reckons
> CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points;
> and CAP_SYS_RAWIO another reason to quarter your points: so running
> as root makes you sixteen times less likely to be killed.  Quartering
> is anyway debatable, but sixteenthing seems utterly excessive to me.
> 
> I moved the CAP_SYS_RAWIO test in with the others, so it does no
> more than quartering; but is quartering appropriate anyway?  I did
> wonder if I was right to be "subverting" the fine-grained CAPs in
> this way, but have since seen unrelated mail from one who knows
> better, implying they're something of a fantasy, that su and sudo
> are indeed what's used in the real world.  Maybe this patch was okay.
> 

I think someone (Nick?) proposed a patch at one time that removed most of 
the heuristics from select_bad_process() other than total_vm of the task 
and its children, mems_allowed intersection, and oom_adj.

> 4.  In some cases those children are sharing exactly the same mm,
> yet its total_vm is being added again and again to the points:
> I had a nasty inner loop searching back to see if we'd already
> counted this mm (but then, what if the different tasks sharing
> the mm deserved different adjustments to the total_vm?).
> 

oom_kill_process() may not kill the task selected by select_bad_process(), 
it will first attempt to kill one of these children with a different mm.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-27 21:04                   ` David Rientjes
@ 2009-10-28  0:08                     ` Vedran Furač
  -1 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-28  0:08 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> There's advantages to either approach, but it depends on the contextual 
> goal of the oom killer when it's called: kill a rogue task that is 
> allocating more memory than expected,

But it is wrong at counting allocated memory!
Come on, it kills /usr/lib/icedove/run-mozilla.sh. Parent, a shell
script, instead of its child(s) which allocated memory. Look, "test"
allocates some (0.1GB) memory, and you have:

% cat test.sh

#!/bin/sh
./test&
./test&
./test&
./test

% perl check_badness.pl|sort -n|g test

26511   7884    test
26511   7885    test
26511   7886    test
26511   7887    test
53994   7883    test.sh

// great, so test.sh "is" the bad ass, ok, emulate OOMK:

% kill -9 7883

// did we kill "a rogue task"

% perl check_badness.pl|sort -n|g test

26511   7884    test
26511   7885    test
26511   7886    test
26511   7887    test

// nooo, they are still alive and eating our memory!

QED by newbie. ;)

> or kill a task that will free the most memory.

.


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28  0:08                     ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-28  0:08 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> There's advantages to either approach, but it depends on the contextual 
> goal of the oom killer when it's called: kill a rogue task that is 
> allocating more memory than expected,

But it is wrong at counting allocated memory!
Come on, it kills /usr/lib/icedove/run-mozilla.sh. Parent, a shell
script, instead of its child(s) which allocated memory. Look, "test"
allocates some (0.1GB) memory, and you have:

% cat test.sh

#!/bin/sh
./test&
./test&
./test&
./test

% perl check_badness.pl|sort -n|g test

26511   7884    test
26511   7885    test
26511   7886    test
26511   7887    test
53994   7883    test.sh

// great, so test.sh "is" the bad ass, ok, emulate OOMK:

% kill -9 7883

// did we kill "a rogue task"

% perl check_badness.pl|sort -n|g test

26511   7884    test
26511   7885    test
26511   7886    test
26511   7887    test

// nooo, they are still alive and eating our memory!

QED by newbie. ;)

> or kill a task that will free the most memory.

.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28  0:08                     ` Vedran Furač
@ 2009-10-28  0:25                       ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-28  0:25 UTC (permalink / raw)
  To: vedran.furac
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 28 Oct 2009, Vedran Fura wrote:

> But it is wrong at counting allocated memory!
> Come on, it kills /usr/lib/icedove/run-mozilla.sh. Parent, a shell
> script, instead of its child(s) which allocated memory. Look, "test"
> allocates some (0.1GB) memory, and you have:
> 
> % cat test.sh
> 
> #!/bin/sh
> ./test&
> ./test&
> ./test&
> ./test
> 
> % perl check_badness.pl|sort -n|g test
> 
> 26511   7884    test
> 26511   7885    test
> 26511   7886    test
> 26511   7887    test
> 53994   7883    test.sh
> 
> // great, so test.sh "is" the bad ass, ok, emulate OOMK:
> 
> % kill -9 7883
> 
> // did we kill "a rogue task"
> 
> % perl check_badness.pl|sort -n|g test
> 
> 26511   7884    test
> 26511   7885    test
> 26511   7886    test
> 26511   7887    test
> 
> // nooo, they are still alive and eating our memory!
> 

This is wrong; it doesn't "emulate oom" since oom_kill_process() always 
kills a child of the selected process instead if they do not share the 
same memory.  The chosen task in that case is untouched.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28  0:25                       ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-28  0:25 UTC (permalink / raw)
  To: vedran.furac
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 28 Oct 2009, Vedran Fura wrote:

> But it is wrong at counting allocated memory!
> Come on, it kills /usr/lib/icedove/run-mozilla.sh. Parent, a shell
> script, instead of its child(s) which allocated memory. Look, "test"
> allocates some (0.1GB) memory, and you have:
> 
> % cat test.sh
> 
> #!/bin/sh
> ./test&
> ./test&
> ./test&
> ./test
> 
> % perl check_badness.pl|sort -n|g test
> 
> 26511   7884    test
> 26511   7885    test
> 26511   7886    test
> 26511   7887    test
> 53994   7883    test.sh
> 
> // great, so test.sh "is" the bad ass, ok, emulate OOMK:
> 
> % kill -9 7883
> 
> // did we kill "a rogue task"
> 
> % perl check_badness.pl|sort -n|g test
> 
> 26511   7884    test
> 26511   7885    test
> 26511   7886    test
> 26511   7887    test
> 
> // nooo, they are still alive and eating our memory!
> 

This is wrong; it doesn't "emulate oom" since oom_kill_process() always 
kills a child of the selected process instead if they do not share the 
same memory.  The chosen task in that case is untouched.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28  0:25                       ` David Rientjes
@ 2009-10-28  0:39                         ` Vedran Furač
  -1 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-28  0:39 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> This is wrong; it doesn't "emulate oom" since oom_kill_process() always 
> kills a child of the selected process instead if they do not share the 
> same memory.  The chosen task in that case is untouched.

OK, I stand corrected then. Thanks! But, while testing this I lost X
once again and "test" survived for some time (check the timestamps):

http://pastebin.com/d5c9d026e

- It started by killing gkrellm(!!!)
- Then I lost X (kdeinit4 I guess)
- Then 103 seconds after the killing started, it killed "test" - the
real culprit.

I mean... how?!



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28  0:39                         ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-28  0:39 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> This is wrong; it doesn't "emulate oom" since oom_kill_process() always 
> kills a child of the selected process instead if they do not share the 
> same memory.  The chosen task in that case is untouched.

OK, I stand corrected then. Thanks! But, while testing this I lost X
once again and "test" survived for some time (check the timestamps):

http://pastebin.com/d5c9d026e

- It started by killing gkrellm(!!!)
- Then I lost X (kdeinit4 I guess)
- Then 103 seconds after the killing started, it killed "test" - the
real culprit.

I mean... how?!


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28  0:39                         ` Vedran Furač
@ 2009-10-28  4:08                           ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-28  4:08 UTC (permalink / raw)
  To: vedran.furac
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 28 Oct 2009, Vedran Furac wrote:

> > This is wrong; it doesn't "emulate oom" since oom_kill_process() always 
> > kills a child of the selected process instead if they do not share the 
> > same memory.  The chosen task in that case is untouched.
> 
> OK, I stand corrected then. Thanks! But, while testing this I lost X
> once again and "test" survived for some time (check the timestamps):
> 
> http://pastebin.com/d5c9d026e
> 
> - It started by killing gkrellm(!!!)
> - Then I lost X (kdeinit4 I guess)
> - Then 103 seconds after the killing started, it killed "test" - the
> real culprit.
> 
> I mean... how?!
> 

Here are the five oom kills that occurred in your log, and notice that the 
first four times it kills a child and not the actual task as I explained:

[97137.724971] Out of memory: kill process 21485 (VBoxSVC) score 1564940 or a child
[97137.725017] Killed process 21503 (VirtualBox)
[97137.864622] Out of memory: kill process 11141 (kdeinit4) score 1196178 or a child
[97137.864656] Killed process 11142 (klauncher)
[97137.888146] Out of memory: kill process 11141 (kdeinit4) score 1184308 or a child
[97137.888180] Killed process 11151 (ksmserver)
[97137.972875] Out of memory: kill process 11141 (kdeinit4) score 1146255 or a child
[97137.972888] Killed process 11224 (audacious2)

Those are practically happening simultaneously with very little memory 
being available between each oom kill.  Only later is "test" killed:

[97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child
[97240.206832] Killed process 5005 (test)

Notice how the badness score is less than 1/4th of the others.  So while 
you may find it to be hogging a lot of memory, there were others that 
consumed much more.

You can get a more detailed understanding of this by doing

	echo 1 > /proc/sys/vm/oom_dump_tasks

before trying your testcase; it will show various information like the 
total_vm and oom_adj value for each task at the time of oom (and the 
actual badness score is exported per-task via /proc/pid/oom_score in 
real-time).  This will also include the rss and show what the end result 
would be in using that value as part of the heuristic on this particular 
workload compared to the current implementation.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28  4:08                           ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-28  4:08 UTC (permalink / raw)
  To: vedran.furac
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 28 Oct 2009, Vedran Furac wrote:

> > This is wrong; it doesn't "emulate oom" since oom_kill_process() always 
> > kills a child of the selected process instead if they do not share the 
> > same memory.  The chosen task in that case is untouched.
> 
> OK, I stand corrected then. Thanks! But, while testing this I lost X
> once again and "test" survived for some time (check the timestamps):
> 
> http://pastebin.com/d5c9d026e
> 
> - It started by killing gkrellm(!!!)
> - Then I lost X (kdeinit4 I guess)
> - Then 103 seconds after the killing started, it killed "test" - the
> real culprit.
> 
> I mean... how?!
> 

Here are the five oom kills that occurred in your log, and notice that the 
first four times it kills a child and not the actual task as I explained:

[97137.724971] Out of memory: kill process 21485 (VBoxSVC) score 1564940 or a child
[97137.725017] Killed process 21503 (VirtualBox)
[97137.864622] Out of memory: kill process 11141 (kdeinit4) score 1196178 or a child
[97137.864656] Killed process 11142 (klauncher)
[97137.888146] Out of memory: kill process 11141 (kdeinit4) score 1184308 or a child
[97137.888180] Killed process 11151 (ksmserver)
[97137.972875] Out of memory: kill process 11141 (kdeinit4) score 1146255 or a child
[97137.972888] Killed process 11224 (audacious2)

Those are practically happening simultaneously with very little memory 
being available between each oom kill.  Only later is "test" killed:

[97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child
[97240.206832] Killed process 5005 (test)

Notice how the badness score is less than 1/4th of the others.  So while 
you may find it to be hogging a lot of memory, there were others that 
consumed much more.

You can get a more detailed understanding of this by doing

	echo 1 > /proc/sys/vm/oom_dump_tasks

before trying your testcase; it will show various information like the 
total_vm and oom_adj value for each task at the time of oom (and the 
actual badness score is exported per-task via /proc/pid/oom_score in 
real-time).  This will also include the rss and show what the end result 
would be in using that value as part of the heuristic on this particular 
workload compared to the current implementation.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28  4:08                           ` David Rientjes
@ 2009-10-28  4:55                             ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-28  4:55 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 27 Oct 2009 21:08:56 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 28 Oct 2009, Vedran Furac wrote:
> 
> > > This is wrong; it doesn't "emulate oom" since oom_kill_process() always 
> > > kills a child of the selected process instead if they do not share the 
> > > same memory.  The chosen task in that case is untouched.
> > 
> > OK, I stand corrected then. Thanks! But, while testing this I lost X
> > once again and "test" survived for some time (check the timestamps):
> > 
> > http://pastebin.com/d5c9d026e
> > 
> > - It started by killing gkrellm(!!!)
> > - Then I lost X (kdeinit4 I guess)
> > - Then 103 seconds after the killing started, it killed "test" - the
> > real culprit.
> > 
> > I mean... how?!
> > 
> 
> Here are the five oom kills that occurred in your log, and notice that the 
> first four times it kills a child and not the actual task as I explained:
> 
> [97137.724971] Out of memory: kill process 21485 (VBoxSVC) score 1564940 or a child
> [97137.725017] Killed process 21503 (VirtualBox)
> [97137.864622] Out of memory: kill process 11141 (kdeinit4) score 1196178 or a child
> [97137.864656] Killed process 11142 (klauncher)
> [97137.888146] Out of memory: kill process 11141 (kdeinit4) score 1184308 or a child
> [97137.888180] Killed process 11151 (ksmserver)
> [97137.972875] Out of memory: kill process 11141 (kdeinit4) score 1146255 or a child
> [97137.972888] Killed process 11224 (audacious2)
> 
> Those are practically happening simultaneously with very little memory 
> being available between each oom kill.  Only later is "test" killed:
> 
> [97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child
> [97240.206832] Killed process 5005 (test)
> 
> Notice how the badness score is less than 1/4th of the others.  So while 
> you may find it to be hogging a lot of memory, there were others that 
> consumed much more.

not related to child-parent problem.

Seeing this number more.
==
[97137.709272] Active_anon:671487 active_file:82 inactive_anon:132316
[97137.709273]  inactive_file:82 unevictable:50 dirty:0 writeback:0 unstable:0
[97137.709273]  free:6122 slab:17179 mapped:30661 pagetables:8052 bounce:0
==

acitve_file + inactive_file is very low. Almost all pages are for anon.
But "mapped(NR_FILE_MAPPED)" is a little high. This implies remaining file caches
are mapped by many processes OR some mega bytes of shmem is used.

# of pagetables is 8052, this means
  8052x4096/8*4k bytes = 16Gbytes of mapped area.

Total available memory is near to be active/inactive + slab 
671487+82+132316+82+50+6122+17179+8052=835370x4k= 3.2Gbytes ?
(this system is swapless)

Then, considering the pmap kosaki shows,
I guess killed ones had big total_vm but has not much real rss,
and no helps for oom.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28  4:55                             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-28  4:55 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 27 Oct 2009 21:08:56 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 28 Oct 2009, Vedran Furac wrote:
> 
> > > This is wrong; it doesn't "emulate oom" since oom_kill_process() always 
> > > kills a child of the selected process instead if they do not share the 
> > > same memory.  The chosen task in that case is untouched.
> > 
> > OK, I stand corrected then. Thanks! But, while testing this I lost X
> > once again and "test" survived for some time (check the timestamps):
> > 
> > http://pastebin.com/d5c9d026e
> > 
> > - It started by killing gkrellm(!!!)
> > - Then I lost X (kdeinit4 I guess)
> > - Then 103 seconds after the killing started, it killed "test" - the
> > real culprit.
> > 
> > I mean... how?!
> > 
> 
> Here are the five oom kills that occurred in your log, and notice that the 
> first four times it kills a child and not the actual task as I explained:
> 
> [97137.724971] Out of memory: kill process 21485 (VBoxSVC) score 1564940 or a child
> [97137.725017] Killed process 21503 (VirtualBox)
> [97137.864622] Out of memory: kill process 11141 (kdeinit4) score 1196178 or a child
> [97137.864656] Killed process 11142 (klauncher)
> [97137.888146] Out of memory: kill process 11141 (kdeinit4) score 1184308 or a child
> [97137.888180] Killed process 11151 (ksmserver)
> [97137.972875] Out of memory: kill process 11141 (kdeinit4) score 1146255 or a child
> [97137.972888] Killed process 11224 (audacious2)
> 
> Those are practically happening simultaneously with very little memory 
> being available between each oom kill.  Only later is "test" killed:
> 
> [97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child
> [97240.206832] Killed process 5005 (test)
> 
> Notice how the badness score is less than 1/4th of the others.  So while 
> you may find it to be hogging a lot of memory, there were others that 
> consumed much more.

not related to child-parent problem.

Seeing this number more.
==
[97137.709272] Active_anon:671487 active_file:82 inactive_anon:132316
[97137.709273]  inactive_file:82 unevictable:50 dirty:0 writeback:0 unstable:0
[97137.709273]  free:6122 slab:17179 mapped:30661 pagetables:8052 bounce:0
==

acitve_file + inactive_file is very low. Almost all pages are for anon.
But "mapped(NR_FILE_MAPPED)" is a little high. This implies remaining file caches
are mapped by many processes OR some mega bytes of shmem is used.

# of pagetables is 8052, this means
  8052x4096/8*4k bytes = 16Gbytes of mapped area.

Total available memory is near to be active/inactive + slab 
671487+82+132316+82+50+6122+17179+8052=835370x4k= 3.2Gbytes ?
(this system is swapless)

Then, considering the pmap kosaki shows,
I guess killed ones had big total_vm but has not much real rss,
and no helps for oom.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28  4:55                             ` KAMEZAWA Hiroyuki
@ 2009-10-28  5:13                               ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-28  5:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 28 Oct 2009, KAMEZAWA Hiroyuki wrote:

> not related to child-parent problem.
> 
> Seeing this number more.
> ==
> [97137.709272] Active_anon:671487 active_file:82 inactive_anon:132316
> [97137.709273]  inactive_file:82 unevictable:50 dirty:0 writeback:0 unstable:0
> [97137.709273]  free:6122 slab:17179 mapped:30661 pagetables:8052 bounce:0
> ==
> 
> acitve_file + inactive_file is very low. Almost all pages are for anon.
> But "mapped(NR_FILE_MAPPED)" is a little high. This implies remaining file caches
> are mapped by many processes OR some mega bytes of shmem is used.
> 
> # of pagetables is 8052, this means
>   8052x4096/8*4k bytes = 16Gbytes of mapped area.
> 
> Total available memory is near to be active/inactive + slab 
> 671487+82+132316+82+50+6122+17179+8052=835370x4k= 3.2Gbytes ?
> (this system is swapless)
> 

Yep:

[97137.724965] 917504 pages RAM
[97137.724967] 69721 pages reserved

(917504 - 69721) * 4K = ~3.23G

> Then, considering the pmap kosaki shows,
> I guess killed ones had big total_vm but has not much real rss,
> and no helps for oom.
> 

echo 1 > /proc/sys/vm/oom_dump_tasks can confirm that.

The bigger issue is making the distinction between killing a rogue task 
that is using much more memory than expected (the supposed current 
behavior, influenced from userspace by /proc/pid/oom_adj), and killing the 
task with the highest rss.  The latter is definitely desired if we are 
allocating tons of memory but reduces the ability of the user to influence 
the badness score.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28  5:13                               ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-28  5:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 28 Oct 2009, KAMEZAWA Hiroyuki wrote:

> not related to child-parent problem.
> 
> Seeing this number more.
> ==
> [97137.709272] Active_anon:671487 active_file:82 inactive_anon:132316
> [97137.709273]  inactive_file:82 unevictable:50 dirty:0 writeback:0 unstable:0
> [97137.709273]  free:6122 slab:17179 mapped:30661 pagetables:8052 bounce:0
> ==
> 
> acitve_file + inactive_file is very low. Almost all pages are for anon.
> But "mapped(NR_FILE_MAPPED)" is a little high. This implies remaining file caches
> are mapped by many processes OR some mega bytes of shmem is used.
> 
> # of pagetables is 8052, this means
>   8052x4096/8*4k bytes = 16Gbytes of mapped area.
> 
> Total available memory is near to be active/inactive + slab 
> 671487+82+132316+82+50+6122+17179+8052=835370x4k= 3.2Gbytes ?
> (this system is swapless)
> 

Yep:

[97137.724965] 917504 pages RAM
[97137.724967] 69721 pages reserved

(917504 - 69721) * 4K = ~3.23G

> Then, considering the pmap kosaki shows,
> I guess killed ones had big total_vm but has not much real rss,
> and no helps for oom.
> 

echo 1 > /proc/sys/vm/oom_dump_tasks can confirm that.

The bigger issue is making the distinction between killing a rogue task 
that is using much more memory than expected (the supposed current 
behavior, influenced from userspace by /proc/pid/oom_adj), and killing the 
task with the highest rss.  The latter is definitely desired if we are 
allocating tons of memory but reduces the ability of the user to influence 
the badness score.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28  5:13                               ` David Rientjes
@ 2009-10-28  6:05                                 ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-28  6:05 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 27 Oct 2009 22:13:44 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> Yep:
> 
> [97137.724965] 917504 pages RAM
> [97137.724967] 69721 pages reserved
> 
> (917504 - 69721) * 4K = ~3.23G
> 
> > Then, considering the pmap kosaki shows,
> > I guess killed ones had big total_vm but has not much real rss,
> > and no helps for oom.
> > 
> 
> echo 1 > /proc/sys/vm/oom_dump_tasks can confirm that.
> 
yes.

> The bigger issue is making the distinction between killing a rogue task 
> that is using much more memory than expected (the supposed current 
> behavior, influenced from userspace by /proc/pid/oom_adj), and killing the 
> task with the highest rss. 

All kernel engineers know "than expected or not" can be never known to the kernel.
So, oom_adj workaround is used now. (by some special users.)
OOM Killer itself is also a workaround, too.
"No kill" is the best thing but we know there are tend to be memory-leaker on bad
systems and all systems in this world are not perfect.

In the kernel view, there is no difference between rogue one and highest rss one.
As heuristics, "time" is used now. But it's not very trustable.

> The latter is definitely desired if we are 
> allocating tons of memory but reduces the ability of the user to influence 
> the badness score.
> 

Yes, some more trustable values other than vmsize/rss/time are appriciated.
I wonder recent memory consumption speed can be an another key value.

Anyway, current bahavior of "killing X" is a bad thing.
We need some fixes.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28  6:05                                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-28  6:05 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 27 Oct 2009 22:13:44 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> Yep:
> 
> [97137.724965] 917504 pages RAM
> [97137.724967] 69721 pages reserved
> 
> (917504 - 69721) * 4K = ~3.23G
> 
> > Then, considering the pmap kosaki shows,
> > I guess killed ones had big total_vm but has not much real rss,
> > and no helps for oom.
> > 
> 
> echo 1 > /proc/sys/vm/oom_dump_tasks can confirm that.
> 
yes.

> The bigger issue is making the distinction between killing a rogue task 
> that is using much more memory than expected (the supposed current 
> behavior, influenced from userspace by /proc/pid/oom_adj), and killing the 
> task with the highest rss. 

All kernel engineers know "than expected or not" can be never known to the kernel.
So, oom_adj workaround is used now. (by some special users.)
OOM Killer itself is also a workaround, too.
"No kill" is the best thing but we know there are tend to be memory-leaker on bad
systems and all systems in this world are not perfect.

In the kernel view, there is no difference between rogue one and highest rss one.
As heuristics, "time" is used now. But it's not very trustable.

> The latter is definitely desired if we are 
> allocating tons of memory but reduces the ability of the user to influence 
> the badness score.
> 

Yes, some more trustable values other than vmsize/rss/time are appriciated.
I wonder recent memory consumption speed can be an another key value.

Anyway, current bahavior of "killing X" is a bad thing.
We need some fixes.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28  6:05                                 ` KAMEZAWA Hiroyuki
@ 2009-10-28  6:17                                   ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-28  6:17 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 28 Oct 2009, KAMEZAWA Hiroyuki wrote:

> All kernel engineers know "than expected or not" can be never known to the kernel.
> So, oom_adj workaround is used now. (by some special users.)
> OOM Killer itself is also a workaround, too.
> "No kill" is the best thing but we know there are tend to be memory-leaker on bad
> systems and all systems in this world are not perfect.
> 

Right, and historically that has been addressed by considering total_vm 
and adjusting it with oom_adj so that we can identify memory leaking tasks 
through user-defined criteria.

> Yes, some more trustable values other than vmsize/rss/time are appriciated.
> I wonder recent memory consumption speed can be an another key value.
> 

Sounds very logical.

> Anyway, current bahavior of "killing X" is a bad thing.
> We need some fixes.
> 

You can easily protect X with OOM_DISABLE, as you know.  I don't think we 
need any X-specific heuristics added to the kernel, it looks like the 
special cases have already polluted badness() enough.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28  6:17                                   ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-28  6:17 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 28 Oct 2009, KAMEZAWA Hiroyuki wrote:

> All kernel engineers know "than expected or not" can be never known to the kernel.
> So, oom_adj workaround is used now. (by some special users.)
> OOM Killer itself is also a workaround, too.
> "No kill" is the best thing but we know there are tend to be memory-leaker on bad
> systems and all systems in this world are not perfect.
> 

Right, and historically that has been addressed by considering total_vm 
and adjusting it with oom_adj so that we can identify memory leaking tasks 
through user-defined criteria.

> Yes, some more trustable values other than vmsize/rss/time are appriciated.
> I wonder recent memory consumption speed can be an another key value.
> 

Sounds very logical.

> Anyway, current bahavior of "killing X" is a bad thing.
> We need some fixes.
> 

You can easily protect X with OOM_DISABLE, as you know.  I don't think we 
need any X-specific heuristics added to the kernel, it looks like the 
special cases have already polluted badness() enough.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28  6:17                                   ` David Rientjes
@ 2009-10-28  6:20                                     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-28  6:20 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 27 Oct 2009 23:17:41 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 28 Oct 2009, KAMEZAWA Hiroyuki wrote:
> 
> > All kernel engineers know "than expected or not" can be never known to the kernel.
> > So, oom_adj workaround is used now. (by some special users.)
> > OOM Killer itself is also a workaround, too.
> > "No kill" is the best thing but we know there are tend to be memory-leaker on bad
> > systems and all systems in this world are not perfect.
> > 
> 
> Right, and historically that has been addressed by considering total_vm 
> and adjusting it with oom_adj so that we can identify memory leaking tasks 
> through user-defined criteria.
> 
> > Yes, some more trustable values other than vmsize/rss/time are appriciated.
> > I wonder recent memory consumption speed can be an another key value.
> > 
> 
> Sounds very logical.
> 
> > Anyway, current bahavior of "killing X" is a bad thing.
> > We need some fixes.
> > 
> 
> You can easily protect X with OOM_DISABLE, as you know.  I don't think we 
> need any X-specific heuristics added to the kernel, it looks like the 
> special cases have already polluted badness() enough.
> 
It's _not_ special to X.

Almost all applications which uses many dynamica libraries can be affected by this,
total_vm. And, as I explained to Vedran, multi-threaded program like Java can easily
increase total_vm without using many anon_rss.
And it's the reason I hate overcommit_memory. size of VM doesn't tell anything.


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28  6:20                                     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-28  6:20 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 27 Oct 2009 23:17:41 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 28 Oct 2009, KAMEZAWA Hiroyuki wrote:
> 
> > All kernel engineers know "than expected or not" can be never known to the kernel.
> > So, oom_adj workaround is used now. (by some special users.)
> > OOM Killer itself is also a workaround, too.
> > "No kill" is the best thing but we know there are tend to be memory-leaker on bad
> > systems and all systems in this world are not perfect.
> > 
> 
> Right, and historically that has been addressed by considering total_vm 
> and adjusting it with oom_adj so that we can identify memory leaking tasks 
> through user-defined criteria.
> 
> > Yes, some more trustable values other than vmsize/rss/time are appriciated.
> > I wonder recent memory consumption speed can be an another key value.
> > 
> 
> Sounds very logical.
> 
> > Anyway, current bahavior of "killing X" is a bad thing.
> > We need some fixes.
> > 
> 
> You can easily protect X with OOM_DISABLE, as you know.  I don't think we 
> need any X-specific heuristics added to the kernel, it looks like the 
> special cases have already polluted badness() enough.
> 
It's _not_ special to X.

Almost all applications which uses many dynamica libraries can be affected by this,
total_vm. And, as I explained to Vedran, multi-threaded program like Java can easily
increase total_vm without using many anon_rss.
And it's the reason I hate overcommit_memory. size of VM doesn't tell anything.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28  6:20                                     ` KAMEZAWA Hiroyuki
@ 2009-10-29  8:38                                       ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-29  8:38 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 28 Oct 2009, KAMEZAWA Hiroyuki wrote:

> It's _not_ special to X.
> 
> Almost all applications which uses many dynamica libraries can be affected by this,
> total_vm. And, as I explained to Vedran, multi-threaded program like Java can easily
> increase total_vm without using many anon_rss.
> And it's the reason I hate overcommit_memory. size of VM doesn't tell anything.
> 

Right, because in Vedran's latest oom log it shows that Xorg is preferred 
more than any other thread other than the memory hogging test program with 
your patch than without.  I pointed out a clear distinction in the killing 
order using both total_vm and rss in that log and in my opinion killing 
Xorg as opposed to krunner would be undesireable.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-29  8:38                                       ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-29  8:38 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 28 Oct 2009, KAMEZAWA Hiroyuki wrote:

> It's _not_ special to X.
> 
> Almost all applications which uses many dynamica libraries can be affected by this,
> total_vm. And, as I explained to Vedran, multi-threaded program like Java can easily
> increase total_vm without using many anon_rss.
> And it's the reason I hate overcommit_memory. size of VM doesn't tell anything.
> 

Right, because in Vedran's latest oom log it shows that Xorg is preferred 
more than any other thread other than the memory hogging test program with 
your patch than without.  I pointed out a clear distinction in the killing 
order using both total_vm and rss in that log and in my opinion killing 
Xorg as opposed to krunner would be undesireable.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-29  8:38                                       ` David Rientjes
@ 2009-10-29 11:11                                         ` Vedran Furač
  -1 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-29 11:11 UTC (permalink / raw)
  To: David Rientjes
  Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> Right, because in Vedran's latest oom log it shows that Xorg is preferred 
> more than any other thread other than the memory hogging test program with 
> your patch than without.  I pointed out a clear distinction in the killing 
> order using both total_vm and rss in that log and in my opinion killing 
> Xorg as opposed to krunner would be undesireable.

But then you should rename OOM killer to TRIPK:
Totally Random Innocent Process Killer

If you have OOM situation and Xorg is the first, that means it's leaking
memory badly and the system is probably already frozen/FUBAR. Killing
krunner in that situation wouldn't do any good. From a user perspective,
nothing changes, system is still FUBAR and (s)he would probably reboot
cursing linux in the process.


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-29 11:11                                         ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-29 11:11 UTC (permalink / raw)
  To: David Rientjes
  Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> Right, because in Vedran's latest oom log it shows that Xorg is preferred 
> more than any other thread other than the memory hogging test program with 
> your patch than without.  I pointed out a clear distinction in the killing 
> order using both total_vm and rss in that log and in my opinion killing 
> Xorg as opposed to krunner would be undesireable.

But then you should rename OOM killer to TRIPK:
Totally Random Innocent Process Killer

If you have OOM situation and Xorg is the first, that means it's leaking
memory badly and the system is probably already frozen/FUBAR. Killing
krunner in that situation wouldn't do any good. From a user perspective,
nothing changes, system is still FUBAR and (s)he would probably reboot
cursing linux in the process.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-29 11:11                                         ` Vedran Furač
@ 2009-10-29 19:53                                           ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-29 19:53 UTC (permalink / raw)
  To: vedran.furac
  Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Thu, 29 Oct 2009, Vedran Furac wrote:

> But then you should rename OOM killer to TRIPK:
> Totally Random Innocent Process Killer
> 

The randomness here is the order of the child list when the oom killer 
selects a task, based on the badness score, and then tries to kill a child 
with a different mm before the parent.

The problem you identified in http://pastebin.com/f3f9674a0, however, is a 
forkbomb issue where the badness score should never have been so high for 
kdeinit4 compared to "test".  That's directly proportional to adding the 
scores of all disjoint child total_vm values into the badness score for 
the parent and then killing the children instead.

That's the problem, not using total_vm as a baseline.  Replacing that with 
rss is not going to solve the issue and reducing the user's ability to 
specify a rough oom priority from userspace is simply not an option.

> If you have OOM situation and Xorg is the first, that means it's leaking
> memory badly and the system is probably already frozen/FUBAR. Killing
> krunner in that situation wouldn't do any good. From a user perspective,
> nothing changes, system is still FUBAR and (s)he would probably reboot
> cursing linux in the process.
> 

It depends on what you're running, we need to be able to have the option 
of protecting very large tasks on production servers.  Imagine if "test" 
here is actually a critical application that we need to protect, its 
not solely mlocked anonymous memory, but still kill if it is leaking 
memory beyond your approximate 2.5GB.  How do you do that when using rss 
as the baseline?

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-29 19:53                                           ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-29 19:53 UTC (permalink / raw)
  To: vedran.furac
  Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Thu, 29 Oct 2009, Vedran Furac wrote:

> But then you should rename OOM killer to TRIPK:
> Totally Random Innocent Process Killer
> 

The randomness here is the order of the child list when the oom killer 
selects a task, based on the badness score, and then tries to kill a child 
with a different mm before the parent.

The problem you identified in http://pastebin.com/f3f9674a0, however, is a 
forkbomb issue where the badness score should never have been so high for 
kdeinit4 compared to "test".  That's directly proportional to adding the 
scores of all disjoint child total_vm values into the badness score for 
the parent and then killing the children instead.

That's the problem, not using total_vm as a baseline.  Replacing that with 
rss is not going to solve the issue and reducing the user's ability to 
specify a rough oom priority from userspace is simply not an option.

> If you have OOM situation and Xorg is the first, that means it's leaking
> memory badly and the system is probably already frozen/FUBAR. Killing
> krunner in that situation wouldn't do any good. From a user perspective,
> nothing changes, system is still FUBAR and (s)he would probably reboot
> cursing linux in the process.
> 

It depends on what you're running, we need to be able to have the option 
of protecting very large tasks on production servers.  Imagine if "test" 
here is actually a critical application that we need to protect, its 
not solely mlocked anonymous memory, but still kill if it is leaking 
memory beyond your approximate 2.5GB.  How do you do that when using rss 
as the baseline?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-29 19:53                                           ` David Rientjes
@ 2009-10-29 23:48                                             ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-29 23:48 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Thu, 29 Oct 2009 12:53:42 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> > If you have OOM situation and Xorg is the first, that means it's leaking
> > memory badly and the system is probably already frozen/FUBAR. Killing
> > krunner in that situation wouldn't do any good. From a user perspective,
> > nothing changes, system is still FUBAR and (s)he would probably reboot
> > cursing linux in the process.
> > 
> 
> It depends on what you're running, we need to be able to have the option 
> of protecting very large tasks on production servers.  Imagine if "test" 
> here is actually a critical application that we need to protect, its 
> not solely mlocked anonymous memory, but still kill if it is leaking 
> memory beyond your approximate 2.5GB.  How do you do that when using rss 
> as the baseline?

As I wrote repeatedly,

   - OOM-Killer itselfs is bad thing, bad situation.
   - The kernel can't know the program is bad or not. just guess it.
   - Then, there is no "correct" OOM-Killer other than fork-bomb killer.
   - User has a knob as oom_adj. This is very strong.

Then, there is only "reasonable" or "easy-to-understand" OOM-Kill.
"Current biggest memory eater is killed" sounds reasonable, easy to
understand. And if total_vm works well, overcommit_guess should catch it.
Please improve overcommit_guess if you want to stay on total_vm.


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-29 23:48                                             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-29 23:48 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Thu, 29 Oct 2009 12:53:42 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> > If you have OOM situation and Xorg is the first, that means it's leaking
> > memory badly and the system is probably already frozen/FUBAR. Killing
> > krunner in that situation wouldn't do any good. From a user perspective,
> > nothing changes, system is still FUBAR and (s)he would probably reboot
> > cursing linux in the process.
> > 
> 
> It depends on what you're running, we need to be able to have the option 
> of protecting very large tasks on production servers.  Imagine if "test" 
> here is actually a critical application that we need to protect, its 
> not solely mlocked anonymous memory, but still kill if it is leaking 
> memory beyond your approximate 2.5GB.  How do you do that when using rss 
> as the baseline?

As I wrote repeatedly,

   - OOM-Killer itselfs is bad thing, bad situation.
   - The kernel can't know the program is bad or not. just guess it.
   - Then, there is no "correct" OOM-Killer other than fork-bomb killer.
   - User has a knob as oom_adj. This is very strong.

Then, there is only "reasonable" or "easy-to-understand" OOM-Kill.
"Current biggest memory eater is killed" sounds reasonable, easy to
understand. And if total_vm works well, overcommit_guess should catch it.
Please improve overcommit_guess if you want to stay on total_vm.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-29 23:48                                             ` KAMEZAWA Hiroyuki
@ 2009-10-30  9:10                                               ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-30  9:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Fri, 30 Oct 2009, KAMEZAWA Hiroyuki wrote:

> As I wrote repeatedly,
> 
>    - OOM-Killer itselfs is bad thing, bad situation.

Not necessarily, the memory controller and cpusets uses it quite often to 
enforce it's policy and is standard runtime behavior.  We'd like to 
imagine that our cpuset will never be too small to run all the attached 
jobs, but that happens and we can easily recover from it by killing a 
task.

>    - The kernel can't know the program is bad or not. just guess it.

Totally irrelevant, given your fourth point about /proc/pid/oom_adj.  We 
can tell the kernel what we'd like the oom killer behavior should be if 
the situation arises.

>    - Then, there is no "correct" OOM-Killer other than fork-bomb killer.

Well of course there is, you're seeing this is a WAY too simplistic 
manner.  If we are oom, we want to be able to influence how the oom killer 
behaves and respond to that situation.  You are proposing that we change 
the baseline for how the oom killer selects tasks which we use CONSTANTLY 
as part of our normal production environment.  I'd appreciate it if you'd 
take it a little more seriously.

>    - User has a knob as oom_adj. This is very strong.
> 

Agreed.

> Then, there is only "reasonable" or "easy-to-understand" OOM-Kill.
> "Current biggest memory eater is killed" sounds reasonable, easy to
> understand. And if total_vm works well, overcommit_guess should catch it.
> Please improve overcommit_guess if you want to stay on total_vm.
> 

I don't necessarily want to stay on total_vm, but I also don't want to 
move to rss as a baseline, as you would probably agree.

We disagree about a very fundamental principle: you are coming from a 
perspective of always wanting to kill the biggest resident memory eater 
even for a single order-0 allocation that fails and I'm coming from a 
perspective of wanting to ensure that our machines know how the oom killer 
will react when it is used.  Moving to rss reduces the ability of the user 
to specify an expected oom priority other than polarizing it by either 
disabling it completely with an oom_adj value of -17 or choosing the 
definite next victim with +15.  That's my objection to it: the user cannot 
possibly be expected to predict what proportion of each application's 
memory will be resident at the time of oom.

I understand you want to totally rewrite the oom killer for whatever 
reason, but I think you need to spend a lot more time understanding the 
needs that the Linux community has for its behavior instead of insisting 
on your point of view.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-30  9:10                                               ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-30  9:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Fri, 30 Oct 2009, KAMEZAWA Hiroyuki wrote:

> As I wrote repeatedly,
> 
>    - OOM-Killer itselfs is bad thing, bad situation.

Not necessarily, the memory controller and cpusets uses it quite often to 
enforce it's policy and is standard runtime behavior.  We'd like to 
imagine that our cpuset will never be too small to run all the attached 
jobs, but that happens and we can easily recover from it by killing a 
task.

>    - The kernel can't know the program is bad or not. just guess it.

Totally irrelevant, given your fourth point about /proc/pid/oom_adj.  We 
can tell the kernel what we'd like the oom killer behavior should be if 
the situation arises.

>    - Then, there is no "correct" OOM-Killer other than fork-bomb killer.

Well of course there is, you're seeing this is a WAY too simplistic 
manner.  If we are oom, we want to be able to influence how the oom killer 
behaves and respond to that situation.  You are proposing that we change 
the baseline for how the oom killer selects tasks which we use CONSTANTLY 
as part of our normal production environment.  I'd appreciate it if you'd 
take it a little more seriously.

>    - User has a knob as oom_adj. This is very strong.
> 

Agreed.

> Then, there is only "reasonable" or "easy-to-understand" OOM-Kill.
> "Current biggest memory eater is killed" sounds reasonable, easy to
> understand. And if total_vm works well, overcommit_guess should catch it.
> Please improve overcommit_guess if you want to stay on total_vm.
> 

I don't necessarily want to stay on total_vm, but I also don't want to 
move to rss as a baseline, as you would probably agree.

We disagree about a very fundamental principle: you are coming from a 
perspective of always wanting to kill the biggest resident memory eater 
even for a single order-0 allocation that fails and I'm coming from a 
perspective of wanting to ensure that our machines know how the oom killer 
will react when it is used.  Moving to rss reduces the ability of the user 
to specify an expected oom priority other than polarizing it by either 
disabling it completely with an oom_adj value of -17 or choosing the 
definite next victim with +15.  That's my objection to it: the user cannot 
possibly be expected to predict what proportion of each application's 
memory will be resident at the time of oom.

I understand you want to totally rewrite the oom killer for whatever 
reason, but I think you need to spend a lot more time understanding the 
needs that the Linux community has for its behavior instead of insisting 
on your point of view.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-30  9:10                                               ` David Rientjes
@ 2009-10-30  9:36                                                 ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-30  9:36 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Fri, 30 Oct 2009 02:10:37 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> >    - The kernel can't know the program is bad or not. just guess it.
> 
> Totally irrelevant, given your fourth point about /proc/pid/oom_adj.  We 
> can tell the kernel what we'd like the oom killer behavior should be if 
> the situation arises.
> 

My point is that the server cannot distinguish memory leak from intentional
memory usage. No other than that.



> >    - Then, there is no "correct" OOM-Killer other than fork-bomb killer.
> 
> Well of course there is, you're seeing this is a WAY too simplistic 
> manner.  If we are oom, we want to be able to influence how the oom killer 
> behaves and respond to that situation.  You are proposing that we change 
> the baseline for how the oom killer selects tasks which we use CONSTANTLY 
> as part of our normal production environment.  I'd appreciate it if you'd 
> take it a little more seriously.
> 
Yes, I'm serious.

In this summer, at lunch with a daily linux user, I was said
"you, enterprise guys, don't consider desktop or laptop problem at all."
yes, I use only servers. My customer uses server, too. My first priority
is always on server users.
But, for this time, I wrote reply to Vedran and try to fix desktop problem.
Even if current logic works well for servers, "KDE/GNOME is killed" problem
seems to be serious. And this may be a problem for EMBEDED people, I guess.


> >    - User has a knob as oom_adj. This is very strong.
> > 
> 
> Agreed.
> 
This and memcg are very useful. But everone says "bad workaround" ;(
Maybe only servers can use these functions.

> > Then, there is only "reasonable" or "easy-to-understand" OOM-Kill.
> > "Current biggest memory eater is killed" sounds reasonable, easy to
> > understand. And if total_vm works well, overcommit_guess should catch it.
> > Please improve overcommit_guess if you want to stay on total_vm.
> > 
> 
> I don't necessarily want to stay on total_vm, but I also don't want to 
> move to rss as a baseline, as you would probably agree.
> 
I'll rewrite all. I'll not rely only on rss. There are several situations
and we need some more information than we have know. I'll have to implement
ways to gather information before chaging badness.


> We disagree about a very fundamental principle: you are coming from a 
> perspective of always wanting to kill the biggest resident memory eater 
> even for a single order-0 allocation that fails and I'm coming from a 
> perspective of wanting to ensure that our machines know how the oom killer 
> will react when it is used. 
yes.

> Moving to rss reduces the ability of the user to specify an expected oom
> priority other than polarizing it by either 
> disabling it completely with an oom_adj value of -17 or choosing the 
> definite next victim with +15.  That's my objection to it: the user cannot 
> possibly be expected to predict what proportion of each application's 
> memory will be resident at the time of oom.
> 
I can say the same thing to total_vm size. total_vm size doesn't include any
good information for oom situation. And tweaking based on that not-useful
parameter will make things worse.

For oom_adj tweak, we may need other technique other than "shift".
If I've wrote oom_adj, I'll write it as

   /proc/<pid>/guarantee_nooom_size

  #echo 3G > /proc/<pid>/guarantee_nooom_size

  Then, 3G bytes of this process's memory usage will not be accounted to badness.

I'm not sure I can add new interface or replace oom_adj, now.
But to do this, current chilren's score problem etc...should be fixed.

> I understand you want to totally rewrite the oom killer for whatever 
> reason, but I think you need to spend a lot more time understanding the 
> needs that the Linux community has for its behavior instead of insisting 
> on your point of view.
> 
yes, use more time. I don't think all of changes can be in quick work.

To be honest, this is a part of work to implement "custom oom handler" cgroup.
Before going further, I'd like to fix current problem.

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-30  9:36                                                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-30  9:36 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Fri, 30 Oct 2009 02:10:37 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> >    - The kernel can't know the program is bad or not. just guess it.
> 
> Totally irrelevant, given your fourth point about /proc/pid/oom_adj.  We 
> can tell the kernel what we'd like the oom killer behavior should be if 
> the situation arises.
> 

My point is that the server cannot distinguish memory leak from intentional
memory usage. No other than that.



> >    - Then, there is no "correct" OOM-Killer other than fork-bomb killer.
> 
> Well of course there is, you're seeing this is a WAY too simplistic 
> manner.  If we are oom, we want to be able to influence how the oom killer 
> behaves and respond to that situation.  You are proposing that we change 
> the baseline for how the oom killer selects tasks which we use CONSTANTLY 
> as part of our normal production environment.  I'd appreciate it if you'd 
> take it a little more seriously.
> 
Yes, I'm serious.

In this summer, at lunch with a daily linux user, I was said
"you, enterprise guys, don't consider desktop or laptop problem at all."
yes, I use only servers. My customer uses server, too. My first priority
is always on server users.
But, for this time, I wrote reply to Vedran and try to fix desktop problem.
Even if current logic works well for servers, "KDE/GNOME is killed" problem
seems to be serious. And this may be a problem for EMBEDED people, I guess.


> >    - User has a knob as oom_adj. This is very strong.
> > 
> 
> Agreed.
> 
This and memcg are very useful. But everone says "bad workaround" ;(
Maybe only servers can use these functions.

> > Then, there is only "reasonable" or "easy-to-understand" OOM-Kill.
> > "Current biggest memory eater is killed" sounds reasonable, easy to
> > understand. And if total_vm works well, overcommit_guess should catch it.
> > Please improve overcommit_guess if you want to stay on total_vm.
> > 
> 
> I don't necessarily want to stay on total_vm, but I also don't want to 
> move to rss as a baseline, as you would probably agree.
> 
I'll rewrite all. I'll not rely only on rss. There are several situations
and we need some more information than we have know. I'll have to implement
ways to gather information before chaging badness.


> We disagree about a very fundamental principle: you are coming from a 
> perspective of always wanting to kill the biggest resident memory eater 
> even for a single order-0 allocation that fails and I'm coming from a 
> perspective of wanting to ensure that our machines know how the oom killer 
> will react when it is used. 
yes.

> Moving to rss reduces the ability of the user to specify an expected oom
> priority other than polarizing it by either 
> disabling it completely with an oom_adj value of -17 or choosing the 
> definite next victim with +15.  That's my objection to it: the user cannot 
> possibly be expected to predict what proportion of each application's 
> memory will be resident at the time of oom.
> 
I can say the same thing to total_vm size. total_vm size doesn't include any
good information for oom situation. And tweaking based on that not-useful
parameter will make things worse.

For oom_adj tweak, we may need other technique other than "shift".
If I've wrote oom_adj, I'll write it as

   /proc/<pid>/guarantee_nooom_size

  #echo 3G > /proc/<pid>/guarantee_nooom_size

  Then, 3G bytes of this process's memory usage will not be accounted to badness.

I'm not sure I can add new interface or replace oom_adj, now.
But to do this, current chilren's score problem etc...should be fixed.

> I understand you want to totally rewrite the oom killer for whatever 
> reason, but I think you need to spend a lot more time understanding the 
> needs that the Linux community has for its behavior instead of insisting 
> on your point of view.
> 
yes, use more time. I don't think all of changes can be in quick work.

To be honest, this is a part of work to implement "custom oom handler" cgroup.
Before going further, I'd like to fix current problem.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-30  9:36                                                 ` KAMEZAWA Hiroyuki
  (?)
@ 2009-10-30 10:49                                                 ` Thomas Fjellstrom
  -1 siblings, 0 replies; 128+ messages in thread
From: Thomas Fjellstrom @ 2009-10-30 10:49 UTC (permalink / raw)
  To: linux-kernel

On Fri October 30 2009, KAMEZAWA Hiroyuki wrote:
> On Fri, 30 Oct 2009 02:10:37 -0700 (PDT)
> 
> David Rientjes <rientjes@google.com> wrote:
> > >    - The kernel can't know the program is bad or not. just guess it.
> >
> > Totally irrelevant, given your fourth point about /proc/pid/oom_adj. 
> > We can tell the kernel what we'd like the oom killer behavior should be
> > if the situation arises.
> 
> My point is that the server cannot distinguish memory leak from
>  intentional memory usage. No other than that.
> 
> > >    - Then, there is no "correct" OOM-Killer other than fork-bomb
> > > killer.
> >
> > Well of course there is, you're seeing this is a WAY too simplistic
> > manner.  If we are oom, we want to be able to influence how the oom
> > killer behaves and respond to that situation.  You are proposing that
> > we change the baseline for how the oom killer selects tasks which we
> > use CONSTANTLY as part of our normal production environment.  I'd
> > appreciate it if you'd take it a little more seriously.
> 
> Yes, I'm serious.
> 
> In this summer, at lunch with a daily linux user, I was said
> "you, enterprise guys, don't consider desktop or laptop problem at all."
> yes, I use only servers. My customer uses server, too. My first priority
> is always on server users.
> But, for this time, I wrote reply to Vedran and try to fix desktop
>  problem. Even if current logic works well for servers, "KDE/GNOME is
>  killed" problem seems to be serious. And this may be a problem for
>  EMBEDED people, I guess.

Whats worse is a friend of mine gets stuck with a useless machine for a 
couple hours or more when oom tries to do its thing. It swap storms for 
hours. Not a good thing imo.

[snip]


-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-30  9:36                                                 ` KAMEZAWA Hiroyuki
@ 2009-11-03 20:49                                                   ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-11-03 20:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Fri, 30 Oct 2009, KAMEZAWA Hiroyuki wrote:

> > >    - The kernel can't know the program is bad or not. just guess it.
> > 
> > Totally irrelevant, given your fourth point about /proc/pid/oom_adj.  We 
> > can tell the kernel what we'd like the oom killer behavior should be if 
> > the situation arises.
> > 
> 
> My point is that the server cannot distinguish memory leak from intentional
> memory usage. No other than that.
> 

That's a different point.  Today, we can influence the badness score of 
any user thread to prioritize oom killing from userspace and that can be 
done regardless of whether there's a memory leaker, a fork bomber, etc.  
The priority based oom killing is important to production scenarios and 
cannot be replaced by a heuristic that works everytime if it cannot be 
influenced by userspace.

A spike in memory consumption when a process is initially forked would be 
defined as a memory leaker in your quiet_time model.

> In this summer, at lunch with a daily linux user, I was said
> "you, enterprise guys, don't consider desktop or laptop problem at all."
> yes, I use only servers. My customer uses server, too. My first priority
> is always on server users.
> But, for this time, I wrote reply to Vedran and try to fix desktop problem.
> Even if current logic works well for servers, "KDE/GNOME is killed" problem
> seems to be serious. And this may be a problem for EMBEDED people, I guess.
> 

You argued before that the problem wasn't specific to X (after I said you 
could protect it very trivially with /proc/pid/oom_adj set to 
OOM_DISABLE), but that's now your reasoning for rewriting the oom killer 
heuristics?

> I can say the same thing to total_vm size. total_vm size doesn't include any
> good information for oom situation. And tweaking based on that not-useful
> parameter will make things worse.
> 

Tweaking on the heuristic will probably make it more convoluted and 
overall worse, I agree.  But it's a more stable baseline than rss from 
which we can set oom killing priorities from userspace.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-11-03 20:49                                                   ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-11-03 20:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Fri, 30 Oct 2009, KAMEZAWA Hiroyuki wrote:

> > >    - The kernel can't know the program is bad or not. just guess it.
> > 
> > Totally irrelevant, given your fourth point about /proc/pid/oom_adj.  We 
> > can tell the kernel what we'd like the oom killer behavior should be if 
> > the situation arises.
> > 
> 
> My point is that the server cannot distinguish memory leak from intentional
> memory usage. No other than that.
> 

That's a different point.  Today, we can influence the badness score of 
any user thread to prioritize oom killing from userspace and that can be 
done regardless of whether there's a memory leaker, a fork bomber, etc.  
The priority based oom killing is important to production scenarios and 
cannot be replaced by a heuristic that works everytime if it cannot be 
influenced by userspace.

A spike in memory consumption when a process is initially forked would be 
defined as a memory leaker in your quiet_time model.

> In this summer, at lunch with a daily linux user, I was said
> "you, enterprise guys, don't consider desktop or laptop problem at all."
> yes, I use only servers. My customer uses server, too. My first priority
> is always on server users.
> But, for this time, I wrote reply to Vedran and try to fix desktop problem.
> Even if current logic works well for servers, "KDE/GNOME is killed" problem
> seems to be serious. And this may be a problem for EMBEDED people, I guess.
> 

You argued before that the problem wasn't specific to X (after I said you 
could protect it very trivially with /proc/pid/oom_adj set to 
OOM_DISABLE), but that's now your reasoning for rewriting the oom killer 
heuristics?

> I can say the same thing to total_vm size. total_vm size doesn't include any
> good information for oom situation. And tweaking based on that not-useful
> parameter will make things worse.
> 

Tweaking on the heuristic will probably make it more convoluted and 
overall worse, I agree.  But it's a more stable baseline than rss from 
which we can set oom killing priorities from userspace.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-11-03 20:49                                                   ` David Rientjes
@ 2009-11-04  0:50                                                     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-04  0:50 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 3 Nov 2009 12:49:52 -0800 (PST)
David Rientjes <rientjes@google.com> wrote:

> On Fri, 30 Oct 2009, KAMEZAWA Hiroyuki wrote:
> 
> > > >    - The kernel can't know the program is bad or not. just guess it.
> > > 
> > > Totally irrelevant, given your fourth point about /proc/pid/oom_adj.  We 
> > > can tell the kernel what we'd like the oom killer behavior should be if 
> > > the situation arises.
> > > 
> > 
> > My point is that the server cannot distinguish memory leak from intentional
> > memory usage. No other than that.
> > 
> 
> That's a different point.  Today, we can influence the badness score of 
> any user thread to prioritize oom killing from userspace and that can be 
> done regardless of whether there's a memory leaker, a fork bomber, etc.  
> The priority based oom killing is important to production scenarios and 
> cannot be replaced by a heuristic that works everytime if it cannot be 
> influenced by userspace.
> 
I don't removed oom_adj...

> A spike in memory consumption when a process is initially forked would be 
> defined as a memory leaker in your quiet_time model.
> 
I'll rewrite or drop quiet_time.

> > In this summer, at lunch with a daily linux user, I was said
> > "you, enterprise guys, don't consider desktop or laptop problem at all."
> > yes, I use only servers. My customer uses server, too. My first priority
> > is always on server users.
> > But, for this time, I wrote reply to Vedran and try to fix desktop problem.
> > Even if current logic works well for servers, "KDE/GNOME is killed" problem
> > seems to be serious. And this may be a problem for EMBEDED people, I guess.
> > 
> 
> You argued before that the problem wasn't specific to X (after I said you 
> could protect it very trivially with /proc/pid/oom_adj set to 
> OOM_DISABLE), but that's now your reasoning for rewriting the oom killer 
> heuristics?
> 
One of reasons. My cusotomers always suffers from "OOM-RANDOM-KILLER".
Why I mentioned about "lunch" is for saying that "I'm not working _only_
for servers."
ok ?


> > I can say the same thing to total_vm size. total_vm size doesn't include any
> > good information for oom situation. And tweaking based on that not-useful
> > parameter will make things worse.
> > 
> 
> Tweaking on the heuristic will probably make it more convoluted and 
> overall worse, I agree.  But it's a more stable baseline than rss from 
> which we can set oom killing priorities from userspace.

- "rss < total_vm_size" always.
- oom_adj culculation is quite strong.
- total_vm of processes which maps hugetlb is very big ....but killing them
  is no help for usual oom.

I recommend you to add "stable baseline" knob for user space, as I wrote.
My patch 6 adds stable baseline bonus as 50% of vm size if run_time is enough
large.

If users can estimate how their process uses memory, it will be good thing.
I'll add some other than oom_adj (I don't say I'll drop oom_adj).

Thanks,
-Kame







^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-11-04  0:50                                                     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-04  0:50 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 3 Nov 2009 12:49:52 -0800 (PST)
David Rientjes <rientjes@google.com> wrote:

> On Fri, 30 Oct 2009, KAMEZAWA Hiroyuki wrote:
> 
> > > >    - The kernel can't know the program is bad or not. just guess it.
> > > 
> > > Totally irrelevant, given your fourth point about /proc/pid/oom_adj.  We 
> > > can tell the kernel what we'd like the oom killer behavior should be if 
> > > the situation arises.
> > > 
> > 
> > My point is that the server cannot distinguish memory leak from intentional
> > memory usage. No other than that.
> > 
> 
> That's a different point.  Today, we can influence the badness score of 
> any user thread to prioritize oom killing from userspace and that can be 
> done regardless of whether there's a memory leaker, a fork bomber, etc.  
> The priority based oom killing is important to production scenarios and 
> cannot be replaced by a heuristic that works everytime if it cannot be 
> influenced by userspace.
> 
I don't removed oom_adj...

> A spike in memory consumption when a process is initially forked would be 
> defined as a memory leaker in your quiet_time model.
> 
I'll rewrite or drop quiet_time.

> > In this summer, at lunch with a daily linux user, I was said
> > "you, enterprise guys, don't consider desktop or laptop problem at all."
> > yes, I use only servers. My customer uses server, too. My first priority
> > is always on server users.
> > But, for this time, I wrote reply to Vedran and try to fix desktop problem.
> > Even if current logic works well for servers, "KDE/GNOME is killed" problem
> > seems to be serious. And this may be a problem for EMBEDED people, I guess.
> > 
> 
> You argued before that the problem wasn't specific to X (after I said you 
> could protect it very trivially with /proc/pid/oom_adj set to 
> OOM_DISABLE), but that's now your reasoning for rewriting the oom killer 
> heuristics?
> 
One of reasons. My cusotomers always suffers from "OOM-RANDOM-KILLER".
Why I mentioned about "lunch" is for saying that "I'm not working _only_
for servers."
ok ?


> > I can say the same thing to total_vm size. total_vm size doesn't include any
> > good information for oom situation. And tweaking based on that not-useful
> > parameter will make things worse.
> > 
> 
> Tweaking on the heuristic will probably make it more convoluted and 
> overall worse, I agree.  But it's a more stable baseline than rss from 
> which we can set oom killing priorities from userspace.

- "rss < total_vm_size" always.
- oom_adj culculation is quite strong.
- total_vm of processes which maps hugetlb is very big ....but killing them
  is no help for usual oom.

I recommend you to add "stable baseline" knob for user space, as I wrote.
My patch 6 adds stable baseline bonus as 50% of vm size if run_time is enough
large.

If users can estimate how their process uses memory, it will be good thing.
I'll add some other than oom_adj (I don't say I'll drop oom_adj).

Thanks,
-Kame






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-11-04  0:50                                                     ` KAMEZAWA Hiroyuki
@ 2009-11-04  1:58                                                       ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-11-04  1:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote:

> > That's a different point.  Today, we can influence the badness score of 
> > any user thread to prioritize oom killing from userspace and that can be 
> > done regardless of whether there's a memory leaker, a fork bomber, etc.  
> > The priority based oom killing is important to production scenarios and 
> > cannot be replaced by a heuristic that works everytime if it cannot be 
> > influenced by userspace.
> > 
> I don't removed oom_adj...
> 

Right, but we must ensure that we have the same ability to influence a 
priority based oom killing scheme from userspace as we currently do with a 
relatively static total_vm.  total_vm may not be the optimal baseline, but 
it does allow users to tune oom_adj specifically to identify tasks that 
are using more memory than expected and to be static enough to not depend 
on rss, for example, that is really hard to predict at the time of oom.

That's actually my main goal in this discussion: to avoid losing any 
ability of userspace to influence to priority of tasks being oom killed 
(if you haven't noticed :).

> > Tweaking on the heuristic will probably make it more convoluted and 
> > overall worse, I agree.  But it's a more stable baseline than rss from 
> > which we can set oom killing priorities from userspace.
> 
> - "rss < total_vm_size" always.

But rss is much more dynamic than total_vm, that's my point.

> - oom_adj culculation is quite strong.
> - total_vm of processes which maps hugetlb is very big ....but killing them
>   is no help for usual oom.
> 
> I recommend you to add "stable baseline" knob for user space, as I wrote.
> My patch 6 adds stable baseline bonus as 50% of vm size if run_time is enough
> large.
> 

There's no clear relationship between VM size and runtime.  The forkbomb 
heuristic itself could easily return a badness of ULONG_MAX if one is 
detected using runtime and number of children, as I earlier proposed, but 
that doesn't seem helpful to factor into the scoring. 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-11-04  1:58                                                       ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-11-04  1:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote:

> > That's a different point.  Today, we can influence the badness score of 
> > any user thread to prioritize oom killing from userspace and that can be 
> > done regardless of whether there's a memory leaker, a fork bomber, etc.  
> > The priority based oom killing is important to production scenarios and 
> > cannot be replaced by a heuristic that works everytime if it cannot be 
> > influenced by userspace.
> > 
> I don't removed oom_adj...
> 

Right, but we must ensure that we have the same ability to influence a 
priority based oom killing scheme from userspace as we currently do with a 
relatively static total_vm.  total_vm may not be the optimal baseline, but 
it does allow users to tune oom_adj specifically to identify tasks that 
are using more memory than expected and to be static enough to not depend 
on rss, for example, that is really hard to predict at the time of oom.

That's actually my main goal in this discussion: to avoid losing any 
ability of userspace to influence to priority of tasks being oom killed 
(if you haven't noticed :).

> > Tweaking on the heuristic will probably make it more convoluted and 
> > overall worse, I agree.  But it's a more stable baseline than rss from 
> > which we can set oom killing priorities from userspace.
> 
> - "rss < total_vm_size" always.

But rss is much more dynamic than total_vm, that's my point.

> - oom_adj culculation is quite strong.
> - total_vm of processes which maps hugetlb is very big ....but killing them
>   is no help for usual oom.
> 
> I recommend you to add "stable baseline" knob for user space, as I wrote.
> My patch 6 adds stable baseline bonus as 50% of vm size if run_time is enough
> large.
> 

There's no clear relationship between VM size and runtime.  The forkbomb 
heuristic itself could easily return a badness of ULONG_MAX if one is 
detected using runtime and number of children, as I earlier proposed, but 
that doesn't seem helpful to factor into the scoring. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-11-04  1:58                                                       ` David Rientjes
@ 2009-11-04  2:17                                                         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-04  2:17 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 3 Nov 2009 17:58:04 -0800 (PST)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote:
> 
> > > That's a different point.  Today, we can influence the badness score of 
> > > any user thread to prioritize oom killing from userspace and that can be 
> > > done regardless of whether there's a memory leaker, a fork bomber, etc.  
> > > The priority based oom killing is important to production scenarios and 
> > > cannot be replaced by a heuristic that works everytime if it cannot be 
> > > influenced by userspace.
> > > 
> > I don't removed oom_adj...
> > 
> 
> Right, but we must ensure that we have the same ability to influence a 
> priority based oom killing scheme from userspace as we currently do with a 
> relatively static total_vm.  total_vm may not be the optimal baseline, but 
> it does allow users to tune oom_adj specifically to identify tasks that 
> are using more memory than expected and to be static enough to not depend 
> on rss, for example, that is really hard to predict at the time of oom.
> 
> That's actually my main goal in this discussion: to avoid losing any 
> ability of userspace to influence to priority of tasks being oom killed 
> (if you haven't noticed :).
> 
> > > Tweaking on the heuristic will probably make it more convoluted and 
> > > overall worse, I agree.  But it's a more stable baseline than rss from 
> > > which we can set oom killing priorities from userspace.
> > 
> > - "rss < total_vm_size" always.
> 
> But rss is much more dynamic than total_vm, that's my point.
> 
My point and your point are differnt.

  1. All my concern is "baseline for heuristics"
  2. All your concern is "baseline for knob, as oom_adj"

ok ? For selecting victim by the kernel, dynamic value is much more useful.
Current behavior of "Random kill" and "Kill multiple processes" are too bad.
Considering oom-killer is for what, I think "1" is more important.

But I know what you want, so, I offers new knob which is not affected by RSS
as I wrote in previous mail.

Off-topic:
As memcg is growing better, using OOM-Killer for resource control should be
ended, I think. Maybe Fake-NUMA+cpuset is working well for google system, 
but plz consider to use memcg. 



> > - oom_adj culculation is quite strong.
> > - total_vm of processes which maps hugetlb is very big ....but killing them
> >   is no help for usual oom.
> > 
> > I recommend you to add "stable baseline" knob for user space, as I wrote.
> > My patch 6 adds stable baseline bonus as 50% of vm size if run_time is enough
> > large.
> > 
> 
> There's no clear relationship between VM size and runtime.  The forkbomb 
> heuristic itself could easily return a badness of ULONG_MAX if one is 
> detected using runtime and number of children, as I earlier proposed, but 
> that doesn't seem helpful to factor into the scoring. 
> 

Old processes are important, younger are not. But as I wrote, I'll drop
most of patch "6". So, plz forget about this part.

I'm interested in fork-bomb killer rather than crazy badness calculation, now.

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-11-04  2:17                                                         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-04  2:17 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 3 Nov 2009 17:58:04 -0800 (PST)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote:
> 
> > > That's a different point.  Today, we can influence the badness score of 
> > > any user thread to prioritize oom killing from userspace and that can be 
> > > done regardless of whether there's a memory leaker, a fork bomber, etc.  
> > > The priority based oom killing is important to production scenarios and 
> > > cannot be replaced by a heuristic that works everytime if it cannot be 
> > > influenced by userspace.
> > > 
> > I don't removed oom_adj...
> > 
> 
> Right, but we must ensure that we have the same ability to influence a 
> priority based oom killing scheme from userspace as we currently do with a 
> relatively static total_vm.  total_vm may not be the optimal baseline, but 
> it does allow users to tune oom_adj specifically to identify tasks that 
> are using more memory than expected and to be static enough to not depend 
> on rss, for example, that is really hard to predict at the time of oom.
> 
> That's actually my main goal in this discussion: to avoid losing any 
> ability of userspace to influence to priority of tasks being oom killed 
> (if you haven't noticed :).
> 
> > > Tweaking on the heuristic will probably make it more convoluted and 
> > > overall worse, I agree.  But it's a more stable baseline than rss from 
> > > which we can set oom killing priorities from userspace.
> > 
> > - "rss < total_vm_size" always.
> 
> But rss is much more dynamic than total_vm, that's my point.
> 
My point and your point are differnt.

  1. All my concern is "baseline for heuristics"
  2. All your concern is "baseline for knob, as oom_adj"

ok ? For selecting victim by the kernel, dynamic value is much more useful.
Current behavior of "Random kill" and "Kill multiple processes" are too bad.
Considering oom-killer is for what, I think "1" is more important.

But I know what you want, so, I offers new knob which is not affected by RSS
as I wrote in previous mail.

Off-topic:
As memcg is growing better, using OOM-Killer for resource control should be
ended, I think. Maybe Fake-NUMA+cpuset is working well for google system, 
but plz consider to use memcg. 



> > - oom_adj culculation is quite strong.
> > - total_vm of processes which maps hugetlb is very big ....but killing them
> >   is no help for usual oom.
> > 
> > I recommend you to add "stable baseline" knob for user space, as I wrote.
> > My patch 6 adds stable baseline bonus as 50% of vm size if run_time is enough
> > large.
> > 
> 
> There's no clear relationship between VM size and runtime.  The forkbomb 
> heuristic itself could easily return a badness of ULONG_MAX if one is 
> detected using runtime and number of children, as I earlier proposed, but 
> that doesn't seem helpful to factor into the scoring. 
> 

Old processes are important, younger are not. But as I wrote, I'll drop
most of patch "6". So, plz forget about this part.

I'm interested in fork-bomb killer rather than crazy badness calculation, now.

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-11-04  2:17                                                         ` KAMEZAWA Hiroyuki
@ 2009-11-04  3:10                                                           ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-11-04  3:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote:

> My point and your point are differnt.
> 
>   1. All my concern is "baseline for heuristics"
>   2. All your concern is "baseline for knob, as oom_adj"
> 
> ok ? For selecting victim by the kernel, dynamic value is much more useful.
> Current behavior of "Random kill" and "Kill multiple processes" are too bad.
> Considering oom-killer is for what, I think "1" is more important.
> 
> But I know what you want, so, I offers new knob which is not affected by RSS
> as I wrote in previous mail.
> 
> Off-topic:
> As memcg is growing better, using OOM-Killer for resource control should be
> ended, I think. Maybe Fake-NUMA+cpuset is working well for google system, 
> but plz consider to use memcg. 
> 

I understand what you're trying to do, and I agree with it for most 
desktop systems.  However, I think that admins should have a very strong 
influence in what tasks the oom killer kills.  It doesn't really matter if 
it's via oom_adj or not, and its debatable whether an adjustment on a 
static heuristic score is in our best interest in the first place.  But we 
must have an alternative so that our control over oom killing isn't lost.

I'd also like to open another topic for discussion if you're proposing 
such sweeping changes: at what point do we allow ~__GFP_NOFAIL allocations 
to fail even if order < PAGE_ALLOC_COSTLY_ORDER and defer killing 
anything?  We both agreed that it's not always in the best interest to 
kill a task so that an allocation can succeed, so we need to define some 
criteria to simply fail the allocation instead.

> Old processes are important, younger are not. But as I wrote, I'll drop
> most of patch "6". So, plz forget about this part.
> 
> I'm interested in fork-bomb killer rather than crazy badness calculation, now.
> 

Ok, great.  Thanks.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-11-04  3:10                                                           ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-11-04  3:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote:

> My point and your point are differnt.
> 
>   1. All my concern is "baseline for heuristics"
>   2. All your concern is "baseline for knob, as oom_adj"
> 
> ok ? For selecting victim by the kernel, dynamic value is much more useful.
> Current behavior of "Random kill" and "Kill multiple processes" are too bad.
> Considering oom-killer is for what, I think "1" is more important.
> 
> But I know what you want, so, I offers new knob which is not affected by RSS
> as I wrote in previous mail.
> 
> Off-topic:
> As memcg is growing better, using OOM-Killer for resource control should be
> ended, I think. Maybe Fake-NUMA+cpuset is working well for google system, 
> but plz consider to use memcg. 
> 

I understand what you're trying to do, and I agree with it for most 
desktop systems.  However, I think that admins should have a very strong 
influence in what tasks the oom killer kills.  It doesn't really matter if 
it's via oom_adj or not, and its debatable whether an adjustment on a 
static heuristic score is in our best interest in the first place.  But we 
must have an alternative so that our control over oom killing isn't lost.

I'd also like to open another topic for discussion if you're proposing 
such sweeping changes: at what point do we allow ~__GFP_NOFAIL allocations 
to fail even if order < PAGE_ALLOC_COSTLY_ORDER and defer killing 
anything?  We both agreed that it's not always in the best interest to 
kill a task so that an allocation can succeed, so we need to define some 
criteria to simply fail the allocation instead.

> Old processes are important, younger are not. But as I wrote, I'll drop
> most of patch "6". So, plz forget about this part.
> 
> I'm interested in fork-bomb killer rather than crazy badness calculation, now.
> 

Ok, great.  Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-11-04  3:10                                                           ` David Rientjes
@ 2009-11-04  3:19                                                             ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-04  3:19 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 3 Nov 2009 19:10:34 -0800 (PST)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote:
> 
> > My point and your point are differnt.
> > 
> >   1. All my concern is "baseline for heuristics"
> >   2. All your concern is "baseline for knob, as oom_adj"
> > 
> > ok ? For selecting victim by the kernel, dynamic value is much more useful.
> > Current behavior of "Random kill" and "Kill multiple processes" are too bad.
> > Considering oom-killer is for what, I think "1" is more important.
> > 
> > But I know what you want, so, I offers new knob which is not affected by RSS
> > as I wrote in previous mail.
> > 
> > Off-topic:
> > As memcg is growing better, using OOM-Killer for resource control should be
> > ended, I think. Maybe Fake-NUMA+cpuset is working well for google system, 
> > but plz consider to use memcg. 
> > 
> 
> I understand what you're trying to do, and I agree with it for most 
> desktop systems.  However, I think that admins should have a very strong 
> influence in what tasks the oom killer kills.  It doesn't really matter if 
> it's via oom_adj or not, and its debatable whether an adjustment on a 
> static heuristic score is in our best interest in the first place.  But we 
> must have an alternative so that our control over oom killing isn't lost.
> 
I'll not go too quickly, so, let's discuss and rewrite patches more, later.
I'll parepare new version in the next week. For this week, I'll post
swap accounting and improve fork-bomb detector.

> I'd also like to open another topic for discussion if you're proposing 
> such sweeping changes: at what point do we allow ~__GFP_NOFAIL allocations 
> to fail even if order < PAGE_ALLOC_COSTLY_ORDER and defer killing 
> anything?  We both agreed that it's not always in the best interest to 
> kill a task so that an allocation can succeed, so we need to define some 
> criteria to simply fail the allocation instead.
> 
Yes, I think allocation itself (> order=0) should fail more before we finally
invoke OOM. It tends to be soft-landing rather than oom-killer.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-11-04  3:19                                                             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-04  3:19 UTC (permalink / raw)
  To: David Rientjes
  Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 3 Nov 2009 19:10:34 -0800 (PST)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote:
> 
> > My point and your point are differnt.
> > 
> >   1. All my concern is "baseline for heuristics"
> >   2. All your concern is "baseline for knob, as oom_adj"
> > 
> > ok ? For selecting victim by the kernel, dynamic value is much more useful.
> > Current behavior of "Random kill" and "Kill multiple processes" are too bad.
> > Considering oom-killer is for what, I think "1" is more important.
> > 
> > But I know what you want, so, I offers new knob which is not affected by RSS
> > as I wrote in previous mail.
> > 
> > Off-topic:
> > As memcg is growing better, using OOM-Killer for resource control should be
> > ended, I think. Maybe Fake-NUMA+cpuset is working well for google system, 
> > but plz consider to use memcg. 
> > 
> 
> I understand what you're trying to do, and I agree with it for most 
> desktop systems.  However, I think that admins should have a very strong 
> influence in what tasks the oom killer kills.  It doesn't really matter if 
> it's via oom_adj or not, and its debatable whether an adjustment on a 
> static heuristic score is in our best interest in the first place.  But we 
> must have an alternative so that our control over oom killing isn't lost.
> 
I'll not go too quickly, so, let's discuss and rewrite patches more, later.
I'll parepare new version in the next week. For this week, I'll post
swap accounting and improve fork-bomb detector.

> I'd also like to open another topic for discussion if you're proposing 
> such sweeping changes: at what point do we allow ~__GFP_NOFAIL allocations 
> to fail even if order < PAGE_ALLOC_COSTLY_ORDER and defer killing 
> anything?  We both agreed that it's not always in the best interest to 
> kill a task so that an allocation can succeed, so we need to define some 
> criteria to simply fail the allocation instead.
> 
Yes, I think allocation itself (> order=0) should fail more before we finally
invoke OOM. It tends to be soft-landing rather than oom-killer.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-29 19:53                                           ` David Rientjes
@ 2009-10-30 13:59                                             ` Vedran Furač
  -1 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-30 13:59 UTC (permalink / raw)
  To: David Rientjes
  Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> On Thu, 29 Oct 2009, Vedran Furac wrote:
> 
>> But then you should rename OOM killer to TRIPK:
>> Totally Random Innocent Process Killer
>>
> 
> The randomness here is the order of the child list when the oom killer 
> selects a task, based on the badness score, and then tries to kill a child 
> with a different mm before the parent.
> 
> The problem you identified in http://pastebin.com/f3f9674a0, however, is a 
> forkbomb issue where the badness score should never have been so high for 
> kdeinit4 compared to "test".  That's directly proportional to adding the 
> scores of all disjoint child total_vm values into the badness score for 
> the parent and then killing the children instead.

Could you explain me why ntpd invoked oom killer? Its parent is init. Or
syslog-ng?

> That's the problem, not using total_vm as a baseline.  Replacing that with 
> rss is not going to solve the issue and reducing the user's ability to 
> specify a rough oom priority from userspace is simply not an option.

OK then, if you have a solution, I would be glad to test your patch. I
won't care much if you don't change total_vm as a baseline. Just make
random killing history.

Regards,

Vedran


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-30 13:59                                             ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-30 13:59 UTC (permalink / raw)
  To: David Rientjes
  Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> On Thu, 29 Oct 2009, Vedran Furac wrote:
> 
>> But then you should rename OOM killer to TRIPK:
>> Totally Random Innocent Process Killer
>>
> 
> The randomness here is the order of the child list when the oom killer 
> selects a task, based on the badness score, and then tries to kill a child 
> with a different mm before the parent.
> 
> The problem you identified in http://pastebin.com/f3f9674a0, however, is a 
> forkbomb issue where the badness score should never have been so high for 
> kdeinit4 compared to "test".  That's directly proportional to adding the 
> scores of all disjoint child total_vm values into the badness score for 
> the parent and then killing the children instead.

Could you explain me why ntpd invoked oom killer? Its parent is init. Or
syslog-ng?

> That's the problem, not using total_vm as a baseline.  Replacing that with 
> rss is not going to solve the issue and reducing the user's ability to 
> specify a rough oom priority from userspace is simply not an option.

OK then, if you have a solution, I would be glad to test your patch. I
won't care much if you don't change total_vm as a baseline. Just make
random killing history.

Regards,

Vedran

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-30 13:59                                             ` Vedran Furač
@ 2009-10-30 19:24                                               ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-30 19:24 UTC (permalink / raw)
  To: vedran.furac
  Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Fri, 30 Oct 2009, Vedran Furac wrote:

> > The problem you identified in http://pastebin.com/f3f9674a0, however, is a 
> > forkbomb issue where the badness score should never have been so high for 
> > kdeinit4 compared to "test".  That's directly proportional to adding the 
> > scores of all disjoint child total_vm values into the badness score for 
> > the parent and then killing the children instead.
> 
> Could you explain me why ntpd invoked oom killer? Its parent is init. Or
> syslog-ng?
> 

Because it attempted an order-0 GFP_USER allocation and direct reclaim 
could not free any pages.

The task that invoked the oom killer is simply the unlucky task that tried 
an allocation that couldn't be satisified through direct reclaim.  It's 
usually unrelated to the task chosen for kill unless 
/proc/sys/vm/oom_kill_allocating_task is enabled (which SGI requested to 
avoid excessively long tasklist scans).

> > That's the problem, not using total_vm as a baseline.  Replacing that with 
> > rss is not going to solve the issue and reducing the user's ability to 
> > specify a rough oom priority from userspace is simply not an option.
> 
> OK then, if you have a solution, I would be glad to test your patch. I
> won't care much if you don't change total_vm as a baseline. Just make
> random killing history.
> 

The only randomness is in selecting a task that has a different mm from 
the parent in the order of its child list.  Yes, that can be addressed by 
doing a smarter iteration through the children before killing one of them.

Keep in mind that a heuristic as simple as this:

 - kill the task that was started most recently by the same uid, or

 - kill the task that was started most recently on the system if a root
   task calls the oom killer,

would have yielded perfect results for your testcase but isn't necessarily 
something that we'd ever want to see.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-30 19:24                                               ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-30 19:24 UTC (permalink / raw)
  To: vedran.furac
  Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Fri, 30 Oct 2009, Vedran Furac wrote:

> > The problem you identified in http://pastebin.com/f3f9674a0, however, is a 
> > forkbomb issue where the badness score should never have been so high for 
> > kdeinit4 compared to "test".  That's directly proportional to adding the 
> > scores of all disjoint child total_vm values into the badness score for 
> > the parent and then killing the children instead.
> 
> Could you explain me why ntpd invoked oom killer? Its parent is init. Or
> syslog-ng?
> 

Because it attempted an order-0 GFP_USER allocation and direct reclaim 
could not free any pages.

The task that invoked the oom killer is simply the unlucky task that tried 
an allocation that couldn't be satisified through direct reclaim.  It's 
usually unrelated to the task chosen for kill unless 
/proc/sys/vm/oom_kill_allocating_task is enabled (which SGI requested to 
avoid excessively long tasklist scans).

> > That's the problem, not using total_vm as a baseline.  Replacing that with 
> > rss is not going to solve the issue and reducing the user's ability to 
> > specify a rough oom priority from userspace is simply not an option.
> 
> OK then, if you have a solution, I would be glad to test your patch. I
> won't care much if you don't change total_vm as a baseline. Just make
> random killing history.
> 

The only randomness is in selecting a task that has a different mm from 
the parent in the order of its child list.  Yes, that can be addressed by 
doing a smarter iteration through the children before killing one of them.

Keep in mind that a heuristic as simple as this:

 - kill the task that was started most recently by the same uid, or

 - kill the task that was started most recently on the system if a root
   task calls the oom killer,

would have yielded perfect results for your testcase but isn't necessarily 
something that we'd ever want to see.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-30 19:24                                               ` David Rientjes
@ 2009-11-02 19:58                                                 ` Vedran Furač
  -1 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-11-02 19:58 UTC (permalink / raw)
  To: David Rientjes
  Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> On Fri, 30 Oct 2009, Vedran Furac wrote:
> 
>>> The problem you identified in http://pastebin.com/f3f9674a0, however, is a 
>>> forkbomb issue where the badness score should never have been so high for 
>>> kdeinit4 compared to "test".  That's directly proportional to adding the 
>>> scores of all disjoint child total_vm values into the badness score for 
>>> the parent and then killing the children instead.
>> Could you explain me why ntpd invoked oom killer? Its parent is init. Or
>> syslog-ng?
>>
> 
> Because it attempted an order-0 GFP_USER allocation and direct reclaim 
> could not free any pages.
> 
> The task that invoked the oom killer is simply the unlucky task that tried 
> an allocation that couldn't be satisified through direct reclaim.  It's 
> usually unrelated to the task chosen for kill unless 
> /proc/sys/vm/oom_kill_allocating_task is enabled (which SGI requested to 
> avoid excessively long tasklist scans).

Oh, well, I didn't know that. Maybe rephrasing of that part of the
output would help eliminating future misinterpretation.

>> OK then, if you have a solution, I would be glad to test your patch. I
>> won't care much if you don't change total_vm as a baseline. Just make
>> random killing history.
> 
> The only randomness is in selecting a task that has a different mm from 
> the parent in the order of its child list.  Yes, that can be addressed by 
> doing a smarter iteration through the children before killing one of them.
> 
> Keep in mind that a heuristic as simple as this:
> 
>  - kill the task that was started most recently by the same uid, or
> 
>  - kill the task that was started most recently on the system if a root
>    task calls the oom killer,
> 
> would have yielded perfect results for your testcase but isn't necessarily 
> something that we'd ever want to see.

Of course, I want algorithm that works well in all possible situations.

Regards,

Vedran


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-11-02 19:58                                                 ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-11-02 19:58 UTC (permalink / raw)
  To: David Rientjes
  Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> On Fri, 30 Oct 2009, Vedran Furac wrote:
> 
>>> The problem you identified in http://pastebin.com/f3f9674a0, however, is a 
>>> forkbomb issue where the badness score should never have been so high for 
>>> kdeinit4 compared to "test".  That's directly proportional to adding the 
>>> scores of all disjoint child total_vm values into the badness score for 
>>> the parent and then killing the children instead.
>> Could you explain me why ntpd invoked oom killer? Its parent is init. Or
>> syslog-ng?
>>
> 
> Because it attempted an order-0 GFP_USER allocation and direct reclaim 
> could not free any pages.
> 
> The task that invoked the oom killer is simply the unlucky task that tried 
> an allocation that couldn't be satisified through direct reclaim.  It's 
> usually unrelated to the task chosen for kill unless 
> /proc/sys/vm/oom_kill_allocating_task is enabled (which SGI requested to 
> avoid excessively long tasklist scans).

Oh, well, I didn't know that. Maybe rephrasing of that part of the
output would help eliminating future misinterpretation.

>> OK then, if you have a solution, I would be glad to test your patch. I
>> won't care much if you don't change total_vm as a baseline. Just make
>> random killing history.
> 
> The only randomness is in selecting a task that has a different mm from 
> the parent in the order of its child list.  Yes, that can be addressed by 
> doing a smarter iteration through the children before killing one of them.
> 
> Keep in mind that a heuristic as simple as this:
> 
>  - kill the task that was started most recently by the same uid, or
> 
>  - kill the task that was started most recently on the system if a root
>    task calls the oom killer,
> 
> would have yielded perfect results for your testcase but isn't necessarily 
> something that we'd ever want to see.

Of course, I want algorithm that works well in all possible situations.

Regards,

Vedran

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28  4:08                           ` David Rientjes
@ 2009-10-28 13:28                             ` Vedran Furač
  -1 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-28 13:28 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> On Wed, 28 Oct 2009, Vedran Furac wrote:
> 
>>> This is wrong; it doesn't "emulate oom" since oom_kill_process() always 
>>> kills a child of the selected process instead if they do not share the 
>>> same memory.  The chosen task in that case is untouched.
>> OK, I stand corrected then. Thanks! But, while testing this I lost X
>> once again and "test" survived for some time (check the timestamps):
>>
>> http://pastebin.com/d5c9d026e
>>
>> - It started by killing gkrellm(!!!)
>> - Then I lost X (kdeinit4 I guess)
>> - Then 103 seconds after the killing started, it killed "test" - the
>> real culprit.
>>
>> I mean... how?!
>>
> 
> Here are the five oom kills that occurred in your log, and notice that the 
> first four times it kills a child and not the actual task as I explained:

Yes, but four times wrong.

> Those are practically happening simultaneously with very little memory 
> being available between each oom kill.  Only later is "test" killed:
> 
> [97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child
> [97240.206832] Killed process 5005 (test)
> 
> Notice how the badness score is less than 1/4th of the others.  So while 
> you may find it to be hogging a lot of memory, there were others that 
> consumed much more.
^^^^^^^^^^^^^^^^^^^^^

This is just wrong. I have 3.5GB of RAM, free says that 2GB are empty
(ignoring cache). Culprit then allocates all free memory (2GB). That
means it is using *more* than all other processes *together*. There
cannot be any other "that consumed much more".

> You can get a more detailed understanding of this by doing
> 
> 	echo 1 > /proc/sys/vm/oom_dump_tasks
> 
> before trying your testcase; it will show various information like the 
> total_vm

Looking at total_vm (VIRT in top/vsize in ps?) is completely wrong. If I
sum up those numbers for every process running I would get:

%ps -eo pid,vsize,command|awk '{ SUM += $2} END {print SUM/1024/1024}'
14.7935

14GB. And I only have 3GB. I usually use exmap to get realistic numbers:

http://www.berthels.co.uk/exmap/doc.html

> and oom_adj value for each task at the time of oom (and the 
> actual badness score is exported per-task via /proc/pid/oom_score in 
> real-time).  This will also include the rss and show what the end result 
> would be in using that value as part of the heuristic on this particular 
> workload compared to the current implementation.

Thanks, I'll try that... but I guess that using rss would yield better
results.


Regards,

Vedran

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28 13:28                             ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-28 13:28 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> On Wed, 28 Oct 2009, Vedran Furac wrote:
> 
>>> This is wrong; it doesn't "emulate oom" since oom_kill_process() always 
>>> kills a child of the selected process instead if they do not share the 
>>> same memory.  The chosen task in that case is untouched.
>> OK, I stand corrected then. Thanks! But, while testing this I lost X
>> once again and "test" survived for some time (check the timestamps):
>>
>> http://pastebin.com/d5c9d026e
>>
>> - It started by killing gkrellm(!!!)
>> - Then I lost X (kdeinit4 I guess)
>> - Then 103 seconds after the killing started, it killed "test" - the
>> real culprit.
>>
>> I mean... how?!
>>
> 
> Here are the five oom kills that occurred in your log, and notice that the 
> first four times it kills a child and not the actual task as I explained:

Yes, but four times wrong.

> Those are practically happening simultaneously with very little memory 
> being available between each oom kill.  Only later is "test" killed:
> 
> [97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child
> [97240.206832] Killed process 5005 (test)
> 
> Notice how the badness score is less than 1/4th of the others.  So while 
> you may find it to be hogging a lot of memory, there were others that 
> consumed much more.
^^^^^^^^^^^^^^^^^^^^^

This is just wrong. I have 3.5GB of RAM, free says that 2GB are empty
(ignoring cache). Culprit then allocates all free memory (2GB). That
means it is using *more* than all other processes *together*. There
cannot be any other "that consumed much more".

> You can get a more detailed understanding of this by doing
> 
> 	echo 1 > /proc/sys/vm/oom_dump_tasks
> 
> before trying your testcase; it will show various information like the 
> total_vm

Looking at total_vm (VIRT in top/vsize in ps?) is completely wrong. If I
sum up those numbers for every process running I would get:

%ps -eo pid,vsize,command|awk '{ SUM += $2} END {print SUM/1024/1024}'
14.7935

14GB. And I only have 3GB. I usually use exmap to get realistic numbers:

http://www.berthels.co.uk/exmap/doc.html

> and oom_adj value for each task at the time of oom (and the 
> actual badness score is exported per-task via /proc/pid/oom_score in 
> real-time).  This will also include the rss and show what the end result 
> would be in using that value as part of the heuristic on this particular 
> workload compared to the current implementation.

Thanks, I'll try that... but I guess that using rss would yield better
results.


Regards,

Vedran

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28 13:28                             ` Vedran Furač
@ 2009-10-28 20:10                               ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-28 20:10 UTC (permalink / raw)
  To: Vedran Furac
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 28 Oct 2009, Vedran Furac wrote:

> > Those are practically happening simultaneously with very little memory 
> > being available between each oom kill.  Only later is "test" killed:
> > 
> > [97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child
> > [97240.206832] Killed process 5005 (test)
> > 
> > Notice how the badness score is less than 1/4th of the others.  So while 
> > you may find it to be hogging a lot of memory, there were others that 
> > consumed much more.
> ^^^^^^^^^^^^^^^^^^^^^
> 
> This is just wrong. I have 3.5GB of RAM, free says that 2GB are empty
> (ignoring cache). Culprit then allocates all free memory (2GB). That
> means it is using *more* than all other processes *together*. There
> cannot be any other "that consumed much more".
> 

Just post the oom killer results after using echo 1 > 
/proc/sys/vm/oom_dump_tasks as requested and it will clarify why those 
tasks were chosen to kill.  It will also show the result of using rss 
instead of total_vm and allow us to see how such a change would have 
changed the killing order for your workload.

> Thanks, I'll try that... but I guess that using rss would yield better
> results.
> 

We would know if you posted the data.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28 20:10                               ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-28 20:10 UTC (permalink / raw)
  To: Vedran Furac
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 28 Oct 2009, Vedran Furac wrote:

> > Those are practically happening simultaneously with very little memory 
> > being available between each oom kill.  Only later is "test" killed:
> > 
> > [97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child
> > [97240.206832] Killed process 5005 (test)
> > 
> > Notice how the badness score is less than 1/4th of the others.  So while 
> > you may find it to be hogging a lot of memory, there were others that 
> > consumed much more.
> ^^^^^^^^^^^^^^^^^^^^^
> 
> This is just wrong. I have 3.5GB of RAM, free says that 2GB are empty
> (ignoring cache). Culprit then allocates all free memory (2GB). That
> means it is using *more* than all other processes *together*. There
> cannot be any other "that consumed much more".
> 

Just post the oom killer results after using echo 1 > 
/proc/sys/vm/oom_dump_tasks as requested and it will clarify why those 
tasks were chosen to kill.  It will also show the result of using rss 
instead of total_vm and allow us to see how such a change would have 
changed the killing order for your workload.

> Thanks, I'll try that... but I guess that using rss would yield better
> results.
> 

We would know if you posted the data.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28 20:10                               ` David Rientjes
@ 2009-10-29  3:05                                 ` Vedran Furač
  -1 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-29  3:05 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> We would know if you posted the data.

I need to find some free time to destroy a session on a computer which I
use for work. You could easily test it yourself also as this doesn't
happen only to me.

Anyways, here it is... this time it started with ntpd:

http://pastebin.com/f3f9674a0

Regards,

Vedran

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-29  3:05                                 ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-29  3:05 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> We would know if you posted the data.

I need to find some free time to destroy a session on a computer which I
use for work. You could easily test it yourself also as this doesn't
happen only to me.

Anyways, here it is... this time it started with ntpd:

http://pastebin.com/f3f9674a0

Regards,

Vedran

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-29  3:05                                 ` Vedran Furač
@ 2009-10-29  8:35                                   ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-29  8:35 UTC (permalink / raw)
  To: vedran.furac
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Thu, 29 Oct 2009, Vedran Furac wrote:

> > We would know if you posted the data.
> 
> I need to find some free time to destroy a session on a computer which I
> use for work. You could easily test it yourself also as this doesn't
> happen only to me.
> 
> Anyways, here it is... this time it started with ntpd:
> 
> http://pastebin.com/f3f9674a0
> 

That oom log shows 12 ooms but no tasks actually appear to be getting 
killed (there're no "Killed process 1234 (task)" found).  Do you have any 
idea why?

Anyway, as I posted in response to KAMEZAWA-san's patch, the change to 
get_mm_rss(mm) prefers Xorg more than the current implementation.

>From your log at the link above:

total_vm
669624 test
195695 krunner
187342 krusader
168881 plasma-desktop
130562 ktorrent
127081 knotify4
125881 icedove-bin
123036 akregator

rss
668738 test
42191 Xorg
30761 firefox-bin
13331 icedove-bin
10234 ktorrent
9263 akregator
8864 plasma-desktop
7532 krunner

Can you explain why Xorg is preferred as a baseline to kill rather than 
krunner in your example?

Thanks.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-29  8:35                                   ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-29  8:35 UTC (permalink / raw)
  To: vedran.furac
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Thu, 29 Oct 2009, Vedran Furac wrote:

> > We would know if you posted the data.
> 
> I need to find some free time to destroy a session on a computer which I
> use for work. You could easily test it yourself also as this doesn't
> happen only to me.
> 
> Anyways, here it is... this time it started with ntpd:
> 
> http://pastebin.com/f3f9674a0
> 

That oom log shows 12 ooms but no tasks actually appear to be getting 
killed (there're no "Killed process 1234 (task)" found).  Do you have any 
idea why?

Anyway, as I posted in response to KAMEZAWA-san's patch, the change to 
get_mm_rss(mm) prefers Xorg more than the current implementation.

>From your log at the link above:

total_vm
669624 test
195695 krunner
187342 krusader
168881 plasma-desktop
130562 ktorrent
127081 knotify4
125881 icedove-bin
123036 akregator

rss
668738 test
42191 Xorg
30761 firefox-bin
13331 icedove-bin
10234 ktorrent
9263 akregator
8864 plasma-desktop
7532 krunner

Can you explain why Xorg is preferred as a baseline to kill rather than 
krunner in your example?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-29  8:35                                   ` David Rientjes
@ 2009-10-29 11:01                                     ` Vedran Furač
  -1 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-29 11:01 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> On Thu, 29 Oct 2009, Vedran Furac wrote:
> 
>>> We would know if you posted the data.
>> I need to find some free time to destroy a session on a computer which I
>> use for work. You could easily test it yourself also as this doesn't
>> happen only to me.
>>
>> Anyways, here it is... this time it started with ntpd:
>>
>> http://pastebin.com/f3f9674a0
>>
> 
> That oom log shows 12 ooms but no tasks actually appear to be getting 
> killed (there're no "Killed process 1234 (task)" found).  Do you have any 
> idea why?

That's /var/log/messages. I posted it and not dmesg because whole log
didn't fit dmesg buffer, here is waht i have (compare timestamps):

% dmesg|grep -i kill

[ 1493.064458] Out of memory: kill process 6304 (kdeinit4) score 1190231
or a child
[ 1493.064467] Killed process 6409 (konqueror)
[ 1493.261149] knotify4 invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1493.261166]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1493.276528] Out of memory: kill process 6304 (kdeinit4) score 1161265
or a child
[ 1493.276538] Killed process 6411 (krusader)
[ 1499.221160] akregator invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1499.221178]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1499.236431] Out of memory: kill process 6304 (kdeinit4) score 1067593
or a child
[ 1499.236441] Killed process 6412 (irexec)
[ 1499.370192] firefox-bin invoked oom-killer: gfp_mask=0x201da,
order=0, oomkilladj=0
[ 1499.370209]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1499.385417] Out of memory: kill process 6304 (kdeinit4) score 1066861
or a child
[ 1499.385427] Killed process 6420 (xchm)
[ 1499.458304] kio_file invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1499.458333]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1499.458367]  [<ffffffff81120900>] ? d_kill+0x5c/0x7c
[ 1499.473573] Out of memory: kill process 6304 (kdeinit4) score 1043690
or a child
[ 1499.473582] Killed process 6425 (kio_file)
[ 1500.250746] korgac invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1500.250765]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1500.266186] Out of memory: kill process 6304 (kdeinit4) score 1020350
or a child
[ 1500.266196] Killed process 6464 (icedove)
[ 1500.349355] syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1500.349371]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1500.364689] Out of memory: kill process 6304 (kdeinit4) score 1019864
or a child
[ 1500.364699] Killed process 6477 (kio_http)
[ 1500.452151] kded4 invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1500.452167]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1500.452196]  [<ffffffff81120900>] ? d_kill+0x5c/0x7c
[ 1500.467307] Out of memory: kill process 6304 (kdeinit4) score 993142
or a child
[ 1500.467316] Killed process 6478 (kio_http)
[ 1500.780222] akregator invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1500.780239]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1500.796280] Out of memory: kill process 6304 (kdeinit4) score 966331
or a child
[ 1500.796290] Killed process 6484 (kio_http)
[ 1501.065374] syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1501.065390]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1501.080579] Out of memory: kill process 6304 (kdeinit4) score 939434
or a child
[ 1501.080587] Killed process 6486 (kio_http)
[ 1501.381188] knotify4 invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1501.381204]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1501.396338] Out of memory: kill process 6304 (kdeinit4) score 912691
or a child
[ 1501.396346] Killed process 6487 (firefox-bin)
[ 1502.661294] icedove-bin invoked oom-killer: gfp_mask=0x201da,
order=0, oomkilladj=0
[ 1502.661311]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1502.676563] Out of memory: kill process 7580 (test) score 708945 or a
child
[ 1502.676575] Killed process 7580 (test)


> Can you explain why Xorg is preferred as a baseline to kill rather than 
> krunner in your example?

Krunner is a small app for running other apps and do similar things. It
shouldn't use a lot of memory. OTOH, Xorg has to hold all the pixmaps
and so on. That was expected result. Fist Xorg, then firefox and
thunderbird.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-29 11:01                                     ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-29 11:01 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> On Thu, 29 Oct 2009, Vedran Furac wrote:
> 
>>> We would know if you posted the data.
>> I need to find some free time to destroy a session on a computer which I
>> use for work. You could easily test it yourself also as this doesn't
>> happen only to me.
>>
>> Anyways, here it is... this time it started with ntpd:
>>
>> http://pastebin.com/f3f9674a0
>>
> 
> That oom log shows 12 ooms but no tasks actually appear to be getting 
> killed (there're no "Killed process 1234 (task)" found).  Do you have any 
> idea why?

That's /var/log/messages. I posted it and not dmesg because whole log
didn't fit dmesg buffer, here is waht i have (compare timestamps):

% dmesg|grep -i kill

[ 1493.064458] Out of memory: kill process 6304 (kdeinit4) score 1190231
or a child
[ 1493.064467] Killed process 6409 (konqueror)
[ 1493.261149] knotify4 invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1493.261166]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1493.276528] Out of memory: kill process 6304 (kdeinit4) score 1161265
or a child
[ 1493.276538] Killed process 6411 (krusader)
[ 1499.221160] akregator invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1499.221178]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1499.236431] Out of memory: kill process 6304 (kdeinit4) score 1067593
or a child
[ 1499.236441] Killed process 6412 (irexec)
[ 1499.370192] firefox-bin invoked oom-killer: gfp_mask=0x201da,
order=0, oomkilladj=0
[ 1499.370209]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1499.385417] Out of memory: kill process 6304 (kdeinit4) score 1066861
or a child
[ 1499.385427] Killed process 6420 (xchm)
[ 1499.458304] kio_file invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1499.458333]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1499.458367]  [<ffffffff81120900>] ? d_kill+0x5c/0x7c
[ 1499.473573] Out of memory: kill process 6304 (kdeinit4) score 1043690
or a child
[ 1499.473582] Killed process 6425 (kio_file)
[ 1500.250746] korgac invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1500.250765]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1500.266186] Out of memory: kill process 6304 (kdeinit4) score 1020350
or a child
[ 1500.266196] Killed process 6464 (icedove)
[ 1500.349355] syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1500.349371]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1500.364689] Out of memory: kill process 6304 (kdeinit4) score 1019864
or a child
[ 1500.364699] Killed process 6477 (kio_http)
[ 1500.452151] kded4 invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1500.452167]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1500.452196]  [<ffffffff81120900>] ? d_kill+0x5c/0x7c
[ 1500.467307] Out of memory: kill process 6304 (kdeinit4) score 993142
or a child
[ 1500.467316] Killed process 6478 (kio_http)
[ 1500.780222] akregator invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1500.780239]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1500.796280] Out of memory: kill process 6304 (kdeinit4) score 966331
or a child
[ 1500.796290] Killed process 6484 (kio_http)
[ 1501.065374] syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1501.065390]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1501.080579] Out of memory: kill process 6304 (kdeinit4) score 939434
or a child
[ 1501.080587] Killed process 6486 (kio_http)
[ 1501.381188] knotify4 invoked oom-killer: gfp_mask=0x201da, order=0,
oomkilladj=0
[ 1501.381204]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1501.396338] Out of memory: kill process 6304 (kdeinit4) score 912691
or a child
[ 1501.396346] Killed process 6487 (firefox-bin)
[ 1502.661294] icedove-bin invoked oom-killer: gfp_mask=0x201da,
order=0, oomkilladj=0
[ 1502.661311]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
[ 1502.676563] Out of memory: kill process 7580 (test) score 708945 or a
child
[ 1502.676575] Killed process 7580 (test)


> Can you explain why Xorg is preferred as a baseline to kill rather than 
> krunner in your example?

Krunner is a small app for running other apps and do similar things. It
shouldn't use a lot of memory. OTOH, Xorg has to hold all the pixmaps
and so on. That was expected result. Fist Xorg, then firefox and
thunderbird.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-29 11:01                                     ` Vedran Furač
@ 2009-10-29 19:42                                       ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-29 19:42 UTC (permalink / raw)
  To: vedran.furac
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Thu, 29 Oct 2009, Vedran Furac wrote:

> [ 1493.064458] Out of memory: kill process 6304 (kdeinit4) score 1190231
> or a child
> [ 1493.064467] Killed process 6409 (konqueror)
> [ 1493.261149] knotify4 invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1493.261166]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1493.276528] Out of memory: kill process 6304 (kdeinit4) score 1161265
> or a child
> [ 1493.276538] Killed process 6411 (krusader)
> [ 1499.221160] akregator invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1499.221178]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1499.236431] Out of memory: kill process 6304 (kdeinit4) score 1067593
> or a child
> [ 1499.236441] Killed process 6412 (irexec)
> [ 1499.370192] firefox-bin invoked oom-killer: gfp_mask=0x201da,
> order=0, oomkilladj=0
> [ 1499.370209]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1499.385417] Out of memory: kill process 6304 (kdeinit4) score 1066861
> or a child
> [ 1499.385427] Killed process 6420 (xchm)
> [ 1499.458304] kio_file invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1499.458333]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1499.458367]  [<ffffffff81120900>] ? d_kill+0x5c/0x7c
> [ 1499.473573] Out of memory: kill process 6304 (kdeinit4) score 1043690
> or a child
> [ 1499.473582] Killed process 6425 (kio_file)
> [ 1500.250746] korgac invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1500.250765]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1500.266186] Out of memory: kill process 6304 (kdeinit4) score 1020350
> or a child
> [ 1500.266196] Killed process 6464 (icedove)
> [ 1500.349355] syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1500.349371]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1500.364689] Out of memory: kill process 6304 (kdeinit4) score 1019864
> or a child
> [ 1500.364699] Killed process 6477 (kio_http)
> [ 1500.452151] kded4 invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1500.452167]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1500.452196]  [<ffffffff81120900>] ? d_kill+0x5c/0x7c
> [ 1500.467307] Out of memory: kill process 6304 (kdeinit4) score 993142
> or a child
> [ 1500.467316] Killed process 6478 (kio_http)
> [ 1500.780222] akregator invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1500.780239]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1500.796280] Out of memory: kill process 6304 (kdeinit4) score 966331
> or a child
> [ 1500.796290] Killed process 6484 (kio_http)
> [ 1501.065374] syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1501.065390]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1501.080579] Out of memory: kill process 6304 (kdeinit4) score 939434
> or a child
> [ 1501.080587] Killed process 6486 (kio_http)
> [ 1501.381188] knotify4 invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1501.381204]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1501.396338] Out of memory: kill process 6304 (kdeinit4) score 912691
> or a child
> [ 1501.396346] Killed process 6487 (firefox-bin)
> [ 1502.661294] icedove-bin invoked oom-killer: gfp_mask=0x201da,
> order=0, oomkilladj=0
> [ 1502.661311]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1502.676563] Out of memory: kill process 7580 (test) score 708945 or a
> child
> [ 1502.676575] Killed process 7580 (test)
> 

Ok, so this is the forkbomb problem by adding half of each child's 
total_vm into the badness score of the parent.  We should address this 
completely seperately by addressing that specific part of the heuristic, 
not changing what we consider to be a baseline.

The rationale is quite simple: we'll still experience the same problem 
with rss as we did with total_vm in the forkbomb scenario above on certain 
workloads (maybe not yours, but others).  The oom killer always kills a 
child first if it has a different mm than the selected parent, so the 
amount of memory freeing as a result of that is entirely dependent on the 
order of the child list.  It may be very little, but killed because its 
siblings had large total_vm values.

So instead of focusing on rss, we simply need to find a better heuristic 
for the forkbomb issue which I've already proposed a very trivial solution 
for.  Then, afterwards, we can debate about how the scoring heuristic can 
be changed to select better tasks (and perhaps remove a lot of the clutter 
that's there currently!).

> > Can you explain why Xorg is preferred as a baseline to kill rather than 
> > krunner in your example?
> 
> Krunner is a small app for running other apps and do similar things. It
> shouldn't use a lot of memory. OTOH, Xorg has to hold all the pixmaps
> and so on. That was expected result. Fist Xorg, then firefox and
> thunderbird.
> 

You're making all these claims and assertions based _solely_ on the theory 
that killing the application with the most resident RAM is always the 
optimal solution.  That's just not true, especially if we're just 
allocating small numbers of order-0 memory.

Much better is to allow the user to decide at what point, regardless of 
swap usage, their application is using much more memory than expected or 
required.  They can do that right now pretty well with /proc/pid/oom_adj 
without this outlandish claim that they should be expected to know the rss 
of their applications at the time of oom to effectively tune oom_adj.

What would you suggest?  A script that sits in a loop checking each task's 
current rss from /proc/pid/stat or their current oom priority though 
/proc/pid/oom_score and adjusting oom_adj preemptively just in case the 
oom killer is invoked in the next second?

And that "small app" has 30MB of rss which could be freed, if killed, and 
utilized for subsequent page allocations.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-29 19:42                                       ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-29 19:42 UTC (permalink / raw)
  To: vedran.furac
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Thu, 29 Oct 2009, Vedran Furac wrote:

> [ 1493.064458] Out of memory: kill process 6304 (kdeinit4) score 1190231
> or a child
> [ 1493.064467] Killed process 6409 (konqueror)
> [ 1493.261149] knotify4 invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1493.261166]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1493.276528] Out of memory: kill process 6304 (kdeinit4) score 1161265
> or a child
> [ 1493.276538] Killed process 6411 (krusader)
> [ 1499.221160] akregator invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1499.221178]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1499.236431] Out of memory: kill process 6304 (kdeinit4) score 1067593
> or a child
> [ 1499.236441] Killed process 6412 (irexec)
> [ 1499.370192] firefox-bin invoked oom-killer: gfp_mask=0x201da,
> order=0, oomkilladj=0
> [ 1499.370209]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1499.385417] Out of memory: kill process 6304 (kdeinit4) score 1066861
> or a child
> [ 1499.385427] Killed process 6420 (xchm)
> [ 1499.458304] kio_file invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1499.458333]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1499.458367]  [<ffffffff81120900>] ? d_kill+0x5c/0x7c
> [ 1499.473573] Out of memory: kill process 6304 (kdeinit4) score 1043690
> or a child
> [ 1499.473582] Killed process 6425 (kio_file)
> [ 1500.250746] korgac invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1500.250765]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1500.266186] Out of memory: kill process 6304 (kdeinit4) score 1020350
> or a child
> [ 1500.266196] Killed process 6464 (icedove)
> [ 1500.349355] syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1500.349371]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1500.364689] Out of memory: kill process 6304 (kdeinit4) score 1019864
> or a child
> [ 1500.364699] Killed process 6477 (kio_http)
> [ 1500.452151] kded4 invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1500.452167]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1500.452196]  [<ffffffff81120900>] ? d_kill+0x5c/0x7c
> [ 1500.467307] Out of memory: kill process 6304 (kdeinit4) score 993142
> or a child
> [ 1500.467316] Killed process 6478 (kio_http)
> [ 1500.780222] akregator invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1500.780239]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1500.796280] Out of memory: kill process 6304 (kdeinit4) score 966331
> or a child
> [ 1500.796290] Killed process 6484 (kio_http)
> [ 1501.065374] syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1501.065390]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1501.080579] Out of memory: kill process 6304 (kdeinit4) score 939434
> or a child
> [ 1501.080587] Killed process 6486 (kio_http)
> [ 1501.381188] knotify4 invoked oom-killer: gfp_mask=0x201da, order=0,
> oomkilladj=0
> [ 1501.381204]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1501.396338] Out of memory: kill process 6304 (kdeinit4) score 912691
> or a child
> [ 1501.396346] Killed process 6487 (firefox-bin)
> [ 1502.661294] icedove-bin invoked oom-killer: gfp_mask=0x201da,
> order=0, oomkilladj=0
> [ 1502.661311]  [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264
> [ 1502.676563] Out of memory: kill process 7580 (test) score 708945 or a
> child
> [ 1502.676575] Killed process 7580 (test)
> 

Ok, so this is the forkbomb problem by adding half of each child's 
total_vm into the badness score of the parent.  We should address this 
completely seperately by addressing that specific part of the heuristic, 
not changing what we consider to be a baseline.

The rationale is quite simple: we'll still experience the same problem 
with rss as we did with total_vm in the forkbomb scenario above on certain 
workloads (maybe not yours, but others).  The oom killer always kills a 
child first if it has a different mm than the selected parent, so the 
amount of memory freeing as a result of that is entirely dependent on the 
order of the child list.  It may be very little, but killed because its 
siblings had large total_vm values.

So instead of focusing on rss, we simply need to find a better heuristic 
for the forkbomb issue which I've already proposed a very trivial solution 
for.  Then, afterwards, we can debate about how the scoring heuristic can 
be changed to select better tasks (and perhaps remove a lot of the clutter 
that's there currently!).

> > Can you explain why Xorg is preferred as a baseline to kill rather than 
> > krunner in your example?
> 
> Krunner is a small app for running other apps and do similar things. It
> shouldn't use a lot of memory. OTOH, Xorg has to hold all the pixmaps
> and so on. That was expected result. Fist Xorg, then firefox and
> thunderbird.
> 

You're making all these claims and assertions based _solely_ on the theory 
that killing the application with the most resident RAM is always the 
optimal solution.  That's just not true, especially if we're just 
allocating small numbers of order-0 memory.

Much better is to allow the user to decide at what point, regardless of 
swap usage, their application is using much more memory than expected or 
required.  They can do that right now pretty well with /proc/pid/oom_adj 
without this outlandish claim that they should be expected to know the rss 
of their applications at the time of oom to effectively tune oom_adj.

What would you suggest?  A script that sits in a loop checking each task's 
current rss from /proc/pid/stat or their current oom priority though 
/proc/pid/oom_score and adjusting oom_adj preemptively just in case the 
oom killer is invoked in the next second?

And that "small app" has 30MB of rss which could be freed, if killed, and 
utilized for subsequent page allocations.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-29 19:42                                       ` David Rientjes
@ 2009-10-30 13:53                                         ` Vedran Furač
  -1 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-30 13:53 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> Ok, so this is the forkbomb problem by adding half of each child's 
> total_vm into the badness score of the parent.  We should address this 
> completely seperately by addressing that specific part of the heuristic, 
> not changing what we consider to be a baseline.
> thunderbird.
>
> You're making all these claims and assertions based _solely_ on the theory 
> that killing the application with the most resident RAM is always the 
> optimal solution.  That's just not true, especially if we're just 
> allocating small numbers of order-0 memory.

Well, you are kernel hacker, not me. You know how linux mm works much
more than I do. I just reported a, what I think is a big problem, which
needs to be solved ASAP (2.6.33). I'm afraid that we'll just talk much
and nothing will be done with solution/fix postponed indefinitely. Not
sure if you are interested, but I tested this on windowsxp also, and
nothing bad happens there, system continues to function properly.

For 2-3 years I had memory overcommit turn off. I didn't get any OOM,
but sometimes Java didn't work and it seems that because of some kernel
weirdness (or misunderstanding on my part) I couldn't use all the
available memory:

# echo 2 > /proc/sys/vm/overcommit_memory

# echo 95 > /proc/sys/vm/overcommit_ratio
% ./test  /* malloc in loop as before */
malloc: Cannot allocate memory /* Great, no OOM, but: */

% free -m
          total       used       free     shared    buffers     cached
Mem:      3458        3429         29          0        102       1119
-/+ buffers/cache:    2207       1251

There's plenty of memory available. Shouldn't cache be automatically
dropped (this question was in my original mail, hence the subject)?

All this frustrated not only me, but a great number of users on our
local Croatian linux usenet newsgroup with some of them pointing that as
the reason they use solaris. And so on...

> Much better is to allow the user to decide at what point, regardless of 
> swap usage, their application is using much more memory than expected or 
> required.  They can do that right now pretty well with /proc/pid/oom_adj 
> without this outlandish claim that they should be expected to know the rss 
> of their applications at the time of oom to effectively tune oom_adj.

Believe me, barely a few developers use oom_adj for their applications,
and probably almost none of the end users. What should they do, every
time they start an application, go to console and set the oom_adj. You
cannot expect them to do that.

> What would you suggest?  A script that sits in a loop checking each task's 
> current rss from /proc/pid/stat or their current oom priority though 
> /proc/pid/oom_score and adjusting oom_adj preemptively just in case the 
> oom killer is invoked in the next second?

:)

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-30 13:53                                         ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-30 13:53 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> Ok, so this is the forkbomb problem by adding half of each child's 
> total_vm into the badness score of the parent.  We should address this 
> completely seperately by addressing that specific part of the heuristic, 
> not changing what we consider to be a baseline.
> thunderbird.
>
> You're making all these claims and assertions based _solely_ on the theory 
> that killing the application with the most resident RAM is always the 
> optimal solution.  That's just not true, especially if we're just 
> allocating small numbers of order-0 memory.

Well, you are kernel hacker, not me. You know how linux mm works much
more than I do. I just reported a, what I think is a big problem, which
needs to be solved ASAP (2.6.33). I'm afraid that we'll just talk much
and nothing will be done with solution/fix postponed indefinitely. Not
sure if you are interested, but I tested this on windowsxp also, and
nothing bad happens there, system continues to function properly.

For 2-3 years I had memory overcommit turn off. I didn't get any OOM,
but sometimes Java didn't work and it seems that because of some kernel
weirdness (or misunderstanding on my part) I couldn't use all the
available memory:

# echo 2 > /proc/sys/vm/overcommit_memory

# echo 95 > /proc/sys/vm/overcommit_ratio
% ./test  /* malloc in loop as before */
malloc: Cannot allocate memory /* Great, no OOM, but: */

% free -m
          total       used       free     shared    buffers     cached
Mem:      3458        3429         29          0        102       1119
-/+ buffers/cache:    2207       1251

There's plenty of memory available. Shouldn't cache be automatically
dropped (this question was in my original mail, hence the subject)?

All this frustrated not only me, but a great number of users on our
local Croatian linux usenet newsgroup with some of them pointing that as
the reason they use solaris. And so on...

> Much better is to allow the user to decide at what point, regardless of 
> swap usage, their application is using much more memory than expected or 
> required.  They can do that right now pretty well with /proc/pid/oom_adj 
> without this outlandish claim that they should be expected to know the rss 
> of their applications at the time of oom to effectively tune oom_adj.

Believe me, barely a few developers use oom_adj for their applications,
and probably almost none of the end users. What should they do, every
time they start an application, go to console and set the oom_adj. You
cannot expect them to do that.

> What would you suggest?  A script that sits in a loop checking each task's 
> current rss from /proc/pid/stat or their current oom priority though 
> /proc/pid/oom_score and adjusting oom_adj preemptively just in case the 
> oom killer is invoked in the next second?

:)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-30 13:53                                         ` Vedran Furač
@ 2009-10-30 14:08                                           ` Thomas Fjellstrom
  -1 siblings, 0 replies; 128+ messages in thread
From: Thomas Fjellstrom @ 2009-10-30 14:08 UTC (permalink / raw)
  To: linux-kernel, vedran.furac
  Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Fri October 30 2009, Vedran Furač wrote:
> David Rientjes wrote:
> > Ok, so this is the forkbomb problem by adding half of each child's
> > total_vm into the badness score of the parent.  We should address this
> > completely seperately by addressing that specific part of the
> > heuristic, not changing what we consider to be a baseline.
> > thunderbird.
> >
> > You're making all these claims and assertions based _solely_ on the
> > theory that killing the application with the most resident RAM is
> > always the optimal solution.  That's just not true, especially if we're
> > just allocating small numbers of order-0 memory.
> 
> Well, you are kernel hacker, not me. You know how linux mm works much
> more than I do. I just reported a, what I think is a big problem, which
> needs to be solved ASAP (2.6.33). I'm afraid that we'll just talk much
> and nothing will be done with solution/fix postponed indefinitely. Not
> sure if you are interested, but I tested this on windowsxp also, and
> nothing bad happens there, system continues to function properly.
> 
> For 2-3 years I had memory overcommit turn off. I didn't get any OOM,
> but sometimes Java didn't work and it seems that because of some kernel
> weirdness (or misunderstanding on my part) I couldn't use all the
> available memory:
> 
> # echo 2 > /proc/sys/vm/overcommit_memory
> 
> # echo 95 > /proc/sys/vm/overcommit_ratio
> % ./test  /* malloc in loop as before */
> malloc: Cannot allocate memory /* Great, no OOM, but: */
> 
> % free -m
>           total       used       free     shared    buffers     cached
> Mem:      3458        3429         29          0        102       1119
> -/+ buffers/cache:    2207       1251
> 
> There's plenty of memory available. Shouldn't cache be automatically
> dropped (this question was in my original mail, hence the subject)?
> 
> All this frustrated not only me, but a great number of users on our
> local Croatian linux usenet newsgroup with some of them pointing that as
> the reason they use solaris. And so on...

I think this is the MOST serious issue related to the oom killer. For some 
reason it refuses to drop pages before trying to kill. When it should drop 
cache, THEN kill if needed.

> > Much better is to allow the user to decide at what point, regardless of
> > swap usage, their application is using much more memory than expected
> > or required.  They can do that right now pretty well with
> > /proc/pid/oom_adj without this outlandish claim that they should be
> > expected to know the rss of their applications at the time of oom to
> > effectively tune oom_adj.
> 
> Believe me, barely a few developers use oom_adj for their applications,
> and probably almost none of the end users. What should they do, every
> time they start an application, go to console and set the oom_adj. You
> cannot expect them to do that.
> 
> > What would you suggest?  A script that sits in a loop checking each
> > task's current rss from /proc/pid/stat or their current oom priority
> > though /proc/pid/oom_score and adjusting oom_adj preemptively just in
> > case the oom killer is invoked in the next second?
> >
> :)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
>  in the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-30 14:08                                           ` Thomas Fjellstrom
  0 siblings, 0 replies; 128+ messages in thread
From: Thomas Fjellstrom @ 2009-10-30 14:08 UTC (permalink / raw)
  To: linux-kernel, vedran.furac
  Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Fri October 30 2009, Vedran Furač wrote:
> David Rientjes wrote:
> > Ok, so this is the forkbomb problem by adding half of each child's
> > total_vm into the badness score of the parent.  We should address this
> > completely seperately by addressing that specific part of the
> > heuristic, not changing what we consider to be a baseline.
> > thunderbird.
> >
> > You're making all these claims and assertions based _solely_ on the
> > theory that killing the application with the most resident RAM is
> > always the optimal solution.  That's just not true, especially if we're
> > just allocating small numbers of order-0 memory.
> 
> Well, you are kernel hacker, not me. You know how linux mm works much
> more than I do. I just reported a, what I think is a big problem, which
> needs to be solved ASAP (2.6.33). I'm afraid that we'll just talk much
> and nothing will be done with solution/fix postponed indefinitely. Not
> sure if you are interested, but I tested this on windowsxp also, and
> nothing bad happens there, system continues to function properly.
> 
> For 2-3 years I had memory overcommit turn off. I didn't get any OOM,
> but sometimes Java didn't work and it seems that because of some kernel
> weirdness (or misunderstanding on my part) I couldn't use all the
> available memory:
> 
> # echo 2 > /proc/sys/vm/overcommit_memory
> 
> # echo 95 > /proc/sys/vm/overcommit_ratio
> % ./test  /* malloc in loop as before */
> malloc: Cannot allocate memory /* Great, no OOM, but: */
> 
> % free -m
>           total       used       free     shared    buffers     cached
> Mem:      3458        3429         29          0        102       1119
> -/+ buffers/cache:    2207       1251
> 
> There's plenty of memory available. Shouldn't cache be automatically
> dropped (this question was in my original mail, hence the subject)?
> 
> All this frustrated not only me, but a great number of users on our
> local Croatian linux usenet newsgroup with some of them pointing that as
> the reason they use solaris. And so on...

I think this is the MOST serious issue related to the oom killer. For some 
reason it refuses to drop pages before trying to kill. When it should drop 
cache, THEN kill if needed.

> > Much better is to allow the user to decide at what point, regardless of
> > swap usage, their application is using much more memory than expected
> > or required.  They can do that right now pretty well with
> > /proc/pid/oom_adj without this outlandish claim that they should be
> > expected to know the rss of their applications at the time of oom to
> > effectively tune oom_adj.
> 
> Believe me, barely a few developers use oom_adj for their applications,
> and probably almost none of the end users. What should they do, every
> time they start an application, go to console and set the oom_adj. You
> cannot expect them to do that.
> 
> > What would you suggest?  A script that sits in a loop checking each
> > task's current rss from /proc/pid/stat or their current oom priority
> > though /proc/pid/oom_score and adjusting oom_adj preemptively just in
> > case the oom killer is invoked in the next second?
> >
> :)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
>  in the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-30 14:08                                           ` Thomas Fjellstrom
@ 2009-10-30 15:13                                             ` Vedran Furač
  -1 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-30 15:13 UTC (permalink / raw)
  To: tfjellstrom
  Cc: linux-kernel, David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki,
	linux-mm, KOSAKI Motohiro, minchan.kim, Andrew Morton,
	Andrea Arcangeli

Thomas Fjellstrom wrote:

>> malloc: Cannot allocate memory /* Great, no OOM, but: */
>> 
>> % free -m total       used       free     shared    buffers cached
>> Mem:      3458        3429         29          0        102 1119
>> -/+ buffers/cache:    2207       1251
>> 
>> There's plenty of memory available. Shouldn't cache be 
>> automatically dropped (this question was in my original mail, hence
>>  the subject)?
>> 
> 
> I think this is the MOST serious issue related to the oom killer. For
> some reason it refuses to drop pages before trying to kill. When it
> should drop cache, THEN kill if needed.

This isn't about OOM, but situation when you turn off overcommit. I was
jumping to conclusion here. You can drop caches manually with:
# echo 1 > /proc/sys/vm/drop_caches

but you still get: "malloc: Cannot allocate memory" even if almost
nothing is cached:

        total       used       free     shared    buffers     cached
Mem:    3458       2210       1248          0          3          90
-/+ buffers/cache: 2116       1342

As for not dropping pages by kernel before killing, I don't know nothing
about it. It happens so fast and I never tried to measure it.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-30 15:13                                             ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-30 15:13 UTC (permalink / raw)
  To: tfjellstrom
  Cc: linux-kernel, David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki,
	linux-mm, KOSAKI Motohiro, minchan.kim, Andrew Morton,
	Andrea Arcangeli

Thomas Fjellstrom wrote:

>> malloc: Cannot allocate memory /* Great, no OOM, but: */
>> 
>> % free -m total       used       free     shared    buffers cached
>> Mem:      3458        3429         29          0        102 1119
>> -/+ buffers/cache:    2207       1251
>> 
>> There's plenty of memory available. Shouldn't cache be 
>> automatically dropped (this question was in my original mail, hence
>>  the subject)?
>> 
> 
> I think this is the MOST serious issue related to the oom killer. For
> some reason it refuses to drop pages before trying to kill. When it
> should drop cache, THEN kill if needed.

This isn't about OOM, but situation when you turn off overcommit. I was
jumping to conclusion here. You can drop caches manually with:
# echo 1 > /proc/sys/vm/drop_caches

but you still get: "malloc: Cannot allocate memory" even if almost
nothing is cached:

        total       used       free     shared    buffers     cached
Mem:    3458       2210       1248          0          3          90
-/+ buffers/cache: 2116       1342

As for not dropping pages by kernel before killing, I don't know nothing
about it. It happens so fast and I never tried to measure it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-30 13:53                                         ` Vedran Furač
@ 2009-10-30 14:12                                           ` Andrea Arcangeli
  -1 siblings, 0 replies; 128+ messages in thread
From: Andrea Arcangeli @ 2009-10-30 14:12 UTC (permalink / raw)
  To: Vedran Furač
  Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton

On Fri, Oct 30, 2009 at 02:53:33PM +0100, Vedran Furač wrote:
> % free -m
>           total       used       free     shared    buffers     cached
> Mem:      3458        3429         29          0        102       1119
> -/+ buffers/cache:    2207       1251
> 
> There's plenty of memory available. Shouldn't cache be automatically
> dropped (this question was in my original mail, hence the subject)?

This is not about cache, cache amount is physical, this about
virtual amount that can only go in ram or swap (at any later time,
current time is irrelevant) vs "ram + swap". In short add more swap if
you don't like overcommit and check grep Commit /proc/meminfo in case
this is accounting bug...

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-30 14:12                                           ` Andrea Arcangeli
  0 siblings, 0 replies; 128+ messages in thread
From: Andrea Arcangeli @ 2009-10-30 14:12 UTC (permalink / raw)
  To: Vedran Furač
  Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton

On Fri, Oct 30, 2009 at 02:53:33PM +0100, Vedran FuraA? wrote:
> % free -m
>           total       used       free     shared    buffers     cached
> Mem:      3458        3429         29          0        102       1119
> -/+ buffers/cache:    2207       1251
> 
> There's plenty of memory available. Shouldn't cache be automatically
> dropped (this question was in my original mail, hence the subject)?

This is not about cache, cache amount is physical, this about
virtual amount that can only go in ram or swap (at any later time,
current time is irrelevant) vs "ram + swap". In short add more swap if
you don't like overcommit and check grep Commit /proc/meminfo in case
this is accounting bug...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-30 14:12                                           ` Andrea Arcangeli
@ 2009-10-30 14:41                                             ` Vedran Furač
  -1 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-30 14:41 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton

Andrea Arcangeli wrote:

> On Fri, Oct 30, 2009 at 02:53:33PM +0100, Vedran Furač wrote:
>> % free -m
>>           total       used       free     shared    buffers     cached
>> Mem:      3458        3429         29          0        102       1119
>> -/+ buffers/cache:    2207       1251
>>
>> There's plenty of memory available. Shouldn't cache be automatically
>> dropped (this question was in my original mail, hence the subject)?
> 
> This is not about cache, cache amount is physical, this about
> virtual amount that can only go in ram or swap (at any later time,
> current time is irrelevant) vs "ram + swap".

Oh... so this is because apps "reserve" (Committed_AS?) more then they
currently need.

> In short add more swap if
> you don't like overcommit and check grep Commit /proc/meminfo in case
> this is accounting bug...

A the time of "malloc: Cannot allocate memory":

CommitLimit:     3364440 kB
Committed_AS:    3240200 kB

So probably everything is ok (and free is misleading). Overcommit is
unfortunately necessary if I want to be able to use all my memory.

Btw. http://www.redhat.com/advice/tips/meminfo.html says Committed_AS is
a (gu)estimate. Hope it is a good (not to high) guesstimate. :)

Regards,

Vedran


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-30 14:41                                             ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-10-30 14:41 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton

Andrea Arcangeli wrote:

> On Fri, Oct 30, 2009 at 02:53:33PM +0100, Vedran FuraA? wrote:
>> % free -m
>>           total       used       free     shared    buffers     cached
>> Mem:      3458        3429         29          0        102       1119
>> -/+ buffers/cache:    2207       1251
>>
>> There's plenty of memory available. Shouldn't cache be automatically
>> dropped (this question was in my original mail, hence the subject)?
> 
> This is not about cache, cache amount is physical, this about
> virtual amount that can only go in ram or swap (at any later time,
> current time is irrelevant) vs "ram + swap".

Oh... so this is because apps "reserve" (Committed_AS?) more then they
currently need.

> In short add more swap if
> you don't like overcommit and check grep Commit /proc/meminfo in case
> this is accounting bug...

A the time of "malloc: Cannot allocate memory":

CommitLimit:     3364440 kB
Committed_AS:    3240200 kB

So probably everything is ok (and free is misleading). Overcommit is
unfortunately necessary if I want to be able to use all my memory.

Btw. http://www.redhat.com/advice/tips/meminfo.html says Committed_AS is
a (gu)estimate. Hope it is a good (not to high) guesstimate. :)

Regards,

Vedran

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-30 14:41                                             ` Vedran Furač
@ 2009-10-30 15:15                                               ` Andrea Arcangeli
  -1 siblings, 0 replies; 128+ messages in thread
From: Andrea Arcangeli @ 2009-10-30 15:15 UTC (permalink / raw)
  To: Vedran Furač
  Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton

On Fri, Oct 30, 2009 at 03:41:12PM +0100, Vedran Furač wrote:
> Oh... so this is because apps "reserve" (Committed_AS?) more then they
> currently need.

They don't actually reserve, they end up "reserving" if overcommit is
set to 2 (OVERCOMMIT_NEVER)... Apps aren't reserving, more likely they
simply avoid a flood of mmap when a single one is enough to map an
huge MAP_PRIVATE region like shared libs that you may only execute
partially (this is why total_vm is usually much bigger than real ram
mapped by pagetables represented in rss). But those shared libs are
99% pageable and they don't need to stay in swap or ram, so
overcommit-as greatly overstimates the actual needs even if shared lib
loading wouldn't be 64bit optimized (i.e. large and a single one).

> A the time of "malloc: Cannot allocate memory":
> 
> CommitLimit:     3364440 kB
> Committed_AS:    3240200 kB
> 
> So probably everything is ok (and free is misleading). Overcommit is
> unfortunately necessary if I want to be able to use all my memory.

Add more swap.

> Btw. http://www.redhat.com/advice/tips/meminfo.html says Committed_AS is
> a (gu)estimate. Hope it is a good (not to high) guesstimate. :)

It is a guess in the sense to guarantee no ENOMEM it has to take into
account the worst possible case, that is all shared lib MAP_PRIVATE
mappings are cowed, which is very far from reality. Other than that
the overcommitas should exactly match all mmapped possibly writeable
space that can only fit in ram+swap, so from that point of view it's
not a guessed number (modulo the smp read out of order). The only
guess is how much slab, cache and other stuff is freeable, which
doesn't provide true perfection to OVERCOMMIT_NEVER.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-30 15:15                                               ` Andrea Arcangeli
  0 siblings, 0 replies; 128+ messages in thread
From: Andrea Arcangeli @ 2009-10-30 15:15 UTC (permalink / raw)
  To: Vedran Furač
  Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton

On Fri, Oct 30, 2009 at 03:41:12PM +0100, Vedran FuraA? wrote:
> Oh... so this is because apps "reserve" (Committed_AS?) more then they
> currently need.

They don't actually reserve, they end up "reserving" if overcommit is
set to 2 (OVERCOMMIT_NEVER)... Apps aren't reserving, more likely they
simply avoid a flood of mmap when a single one is enough to map an
huge MAP_PRIVATE region like shared libs that you may only execute
partially (this is why total_vm is usually much bigger than real ram
mapped by pagetables represented in rss). But those shared libs are
99% pageable and they don't need to stay in swap or ram, so
overcommit-as greatly overstimates the actual needs even if shared lib
loading wouldn't be 64bit optimized (i.e. large and a single one).

> A the time of "malloc: Cannot allocate memory":
> 
> CommitLimit:     3364440 kB
> Committed_AS:    3240200 kB
> 
> So probably everything is ok (and free is misleading). Overcommit is
> unfortunately necessary if I want to be able to use all my memory.

Add more swap.

> Btw. http://www.redhat.com/advice/tips/meminfo.html says Committed_AS is
> a (gu)estimate. Hope it is a good (not to high) guesstimate. :)

It is a guess in the sense to guarantee no ENOMEM it has to take into
account the worst possible case, that is all shared lib MAP_PRIVATE
mappings are cowed, which is very far from reality. Other than that
the overcommitas should exactly match all mmapped possibly writeable
space that can only fit in ram+swap, so from that point of view it's
not a guessed number (modulo the smp read out of order). The only
guess is how much slab, cache and other stuff is freeable, which
doesn't provide true perfection to OVERCOMMIT_NEVER.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-30 15:15                                               ` Andrea Arcangeli
@ 2009-10-30 16:24                                                 ` Hugh Dickins
  -1 siblings, 0 replies; 128+ messages in thread
From: Hugh Dickins @ 2009-10-30 16:24 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Vedran Furač, David Rientjes, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton

On Fri, 30 Oct 2009, Andrea Arcangeli wrote:
> 
> It is a guess in the sense to guarantee no ENOMEM it has to take into
> account the worst possible case, that is all shared lib MAP_PRIVATE
> mappings are cowed, which is very far from reality.

A MAP_PRIVATE area is only counted into Committed_AS when it is or
has in the past been PROT_WRITE.  I think it's up to the ELF header
of the shared library whether a section is PROT_WRITE or not; but it
looks like many are not, so Committed_AS should be (a little) nearer
reality than you fear.

Though we do account for Committed_AS, even while allowing overcommit,
we do not at present account for Committed_AS per mm.  Seeing David
and KAMEZAWA-san debating over total_vm versus rss versus anon_rss,
I wonder whether such a "commit" count might be a better measure for
OOM choices (but shmem is as usual awkward: though accounted just once
in Committed_AS, it would probably have to be accounted to every mm
that maps it).  Just an idea to throw into the mix.

Hugh

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-30 16:24                                                 ` Hugh Dickins
  0 siblings, 0 replies; 128+ messages in thread
From: Hugh Dickins @ 2009-10-30 16:24 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Vedran Furač, David Rientjes, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton

On Fri, 30 Oct 2009, Andrea Arcangeli wrote:
> 
> It is a guess in the sense to guarantee no ENOMEM it has to take into
> account the worst possible case, that is all shared lib MAP_PRIVATE
> mappings are cowed, which is very far from reality.

A MAP_PRIVATE area is only counted into Committed_AS when it is or
has in the past been PROT_WRITE.  I think it's up to the ELF header
of the shared library whether a section is PROT_WRITE or not; but it
looks like many are not, so Committed_AS should be (a little) nearer
reality than you fear.

Though we do account for Committed_AS, even while allowing overcommit,
we do not at present account for Committed_AS per mm.  Seeing David
and KAMEZAWA-san debating over total_vm versus rss versus anon_rss,
I wonder whether such a "commit" count might be a better measure for
OOM choices (but shmem is as usual awkward: though accounted just once
in Committed_AS, it would probably have to be accounted to every mm
that maps it).  Just an idea to throw into the mix.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-30 15:15                                               ` Andrea Arcangeli
@ 2009-11-02 19:56                                                 ` Vedran Furač
  -1 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-11-02 19:56 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton

Andrea Arcangeli wrote:

> On Fri, Oct 30, 2009 at 03:41:12PM +0100, Vedran Furač wrote:
>> Oh... so this is because apps "reserve" (Committed_AS?) more then they
>> currently need.
> 
> They don't actually reserve, they end up "reserving" if overcommit is
> set to 2 (OVERCOMMIT_NEVER)... Apps aren't reserving, more likely they
> simply avoid a flood of mmap when a single one is enough to map an
> huge MAP_PRIVATE region like shared libs that you may only execute
> partially (this is why total_vm is usually much bigger than real ram
> mapped by pagetables represented in rss). But those shared libs are
> 99% pageable and they don't need to stay in swap or ram, so
> overcommit-as greatly overstimates the actual needs even if shared lib
> loading wouldn't be 64bit optimized (i.e. large and a single one).

Thanks for info!

>> A the time of "malloc: Cannot allocate memory":
>>
>> CommitLimit:     3364440 kB
>> Committed_AS:    3240200 kB
>>
>> So probably everything is ok (and free is misleading). Overcommit is
>> unfortunately necessary if I want to be able to use all my memory.
> 
> Add more swap.

I don't use swap. With current prices of RAM, swap is history, at least
for desktops. I hate when e.g. firefox gets swapped out if I don't use
it for a while. Removing swap decreased desktop latencies drastically.
And I don't care much if I'll loose 100MB of potential free memory that
could be used for disk cache...

Regards.

Vedran


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-11-02 19:56                                                 ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-11-02 19:56 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton

Andrea Arcangeli wrote:

> On Fri, Oct 30, 2009 at 03:41:12PM +0100, Vedran FuraA? wrote:
>> Oh... so this is because apps "reserve" (Committed_AS?) more then they
>> currently need.
> 
> They don't actually reserve, they end up "reserving" if overcommit is
> set to 2 (OVERCOMMIT_NEVER)... Apps aren't reserving, more likely they
> simply avoid a flood of mmap when a single one is enough to map an
> huge MAP_PRIVATE region like shared libs that you may only execute
> partially (this is why total_vm is usually much bigger than real ram
> mapped by pagetables represented in rss). But those shared libs are
> 99% pageable and they don't need to stay in swap or ram, so
> overcommit-as greatly overstimates the actual needs even if shared lib
> loading wouldn't be 64bit optimized (i.e. large and a single one).

Thanks for info!

>> A the time of "malloc: Cannot allocate memory":
>>
>> CommitLimit:     3364440 kB
>> Committed_AS:    3240200 kB
>>
>> So probably everything is ok (and free is misleading). Overcommit is
>> unfortunately necessary if I want to be able to use all my memory.
> 
> Add more swap.

I don't use swap. With current prices of RAM, swap is history, at least
for desktops. I hate when e.g. firefox gets swapped out if I don't use
it for a while. Removing swap decreased desktop latencies drastically.
And I don't care much if I'll loose 100MB of potential free memory that
could be used for disk cache...

Regards.

Vedran

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-30 13:53                                         ` Vedran Furač
@ 2009-10-30 19:44                                           ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-30 19:44 UTC (permalink / raw)
  To: vedran.furac
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Fri, 30 Oct 2009, Vedran Furac wrote:

> Well, you are kernel hacker, not me. You know how linux mm works much
> more than I do. I just reported a, what I think is a big problem, which
> needs to be solved ASAP (2.6.33).

The oom killer heuristics have not been changed recently, why is this 
suddenly a problem that needs to be immediately addressed?  The heuristics 
you've been referring to have been used for at least three years.

> I'm afraid that we'll just talk much
> and nothing will be done with solution/fix postponed indefinitely. Not
> sure if you are interested, but I tested this on windowsxp also, and
> nothing bad happens there, system continues to function properly.
> 

I'm totally sympathetic to testcases such as your own where the oom killer 
seems to react in an undesirable way.  I agree that it could do a much 
better job at targeting "test" and killing it without negatively impacting 
other tasks.

However, I don't think we can simply change the baseline (like the rss 
change which has been added to -mm (??)) and consider it a major 
improvement when it severely impacts how system administrators are able to 
tune the badness heuristic from userspace via /proc/pid/oom_adj.  I'm sure 
you'd agree that user input is important in this matter and so that we 
should maximize that ability rather than make it more difficult.  That's 
my main criticism of the suggestions thus far (and, sorry, but I have to 
look out for production server interests here: you can't take away our 
ability to influence oom badness scoring just because other simple 
heuristics may be more understandable).

> > Much better is to allow the user to decide at what point, regardless of 
> > swap usage, their application is using much more memory than expected or 
> > required.  They can do that right now pretty well with /proc/pid/oom_adj 
> > without this outlandish claim that they should be expected to know the rss 
> > of their applications at the time of oom to effectively tune oom_adj.
> 
> Believe me, barely a few developers use oom_adj for their applications,
> and probably almost none of the end users. What should they do, every
> time they start an application, go to console and set the oom_adj. You
> cannot expect them to do that.
> 

oom_adj is an extremely important part of our infrastructure and although 
the majority of Linux users may not use it (I know a number of opensource 
programs that tune its own, however), we can't let go of our ability to 
specify an oom killing priority.

There are no simple solutions to this problem: the model proposed thus 
far, which has basically been to acknowledge that oom killer is a bad 
thing to encounter (but within that, some rationale was found that we can 
react however we want??) and should be extremely easy to understand (just 
kill the memory hogger with the most resident RAM) is a non-starter.

What would be better, and what I think we'll end up with, is a root 
selectable heuristic so that production servers and desktop machines can 
use different heuristics to make oom kill selections.  We already have 
/proc/sys/vm/oom_kill_allocating_task which I added 1-2 years ago to 
address concerns specifically of SGI and their enormously long tasklist 
scans.  This would be variation on that idea and would include different 
simplistic behaviors (such as always killing the most memory hogging task, 
killing the most recently started task by the same uid, etc), and leave 
the default heuristic much the same as currently.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-30 19:44                                           ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-30 19:44 UTC (permalink / raw)
  To: vedran.furac
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

On Fri, 30 Oct 2009, Vedran Furac wrote:

> Well, you are kernel hacker, not me. You know how linux mm works much
> more than I do. I just reported a, what I think is a big problem, which
> needs to be solved ASAP (2.6.33).

The oom killer heuristics have not been changed recently, why is this 
suddenly a problem that needs to be immediately addressed?  The heuristics 
you've been referring to have been used for at least three years.

> I'm afraid that we'll just talk much
> and nothing will be done with solution/fix postponed indefinitely. Not
> sure if you are interested, but I tested this on windowsxp also, and
> nothing bad happens there, system continues to function properly.
> 

I'm totally sympathetic to testcases such as your own where the oom killer 
seems to react in an undesirable way.  I agree that it could do a much 
better job at targeting "test" and killing it without negatively impacting 
other tasks.

However, I don't think we can simply change the baseline (like the rss 
change which has been added to -mm (??)) and consider it a major 
improvement when it severely impacts how system administrators are able to 
tune the badness heuristic from userspace via /proc/pid/oom_adj.  I'm sure 
you'd agree that user input is important in this matter and so that we 
should maximize that ability rather than make it more difficult.  That's 
my main criticism of the suggestions thus far (and, sorry, but I have to 
look out for production server interests here: you can't take away our 
ability to influence oom badness scoring just because other simple 
heuristics may be more understandable).

> > Much better is to allow the user to decide at what point, regardless of 
> > swap usage, their application is using much more memory than expected or 
> > required.  They can do that right now pretty well with /proc/pid/oom_adj 
> > without this outlandish claim that they should be expected to know the rss 
> > of their applications at the time of oom to effectively tune oom_adj.
> 
> Believe me, barely a few developers use oom_adj for their applications,
> and probably almost none of the end users. What should they do, every
> time they start an application, go to console and set the oom_adj. You
> cannot expect them to do that.
> 

oom_adj is an extremely important part of our infrastructure and although 
the majority of Linux users may not use it (I know a number of opensource 
programs that tune its own, however), we can't let go of our ability to 
specify an oom killing priority.

There are no simple solutions to this problem: the model proposed thus 
far, which has basically been to acknowledge that oom killer is a bad 
thing to encounter (but within that, some rationale was found that we can 
react however we want??) and should be extremely easy to understand (just 
kill the memory hogger with the most resident RAM) is a non-starter.

What would be better, and what I think we'll end up with, is a root 
selectable heuristic so that production servers and desktop machines can 
use different heuristics to make oom kill selections.  We already have 
/proc/sys/vm/oom_kill_allocating_task which I added 1-2 years ago to 
address concerns specifically of SGI and their enormously long tasklist 
scans.  This would be variation on that idea and would include different 
simplistic behaviors (such as always killing the most memory hogging task, 
killing the most recently started task by the same uid, etc), and leave 
the default heuristic much the same as currently.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-30 19:44                                           ` David Rientjes
@ 2009-11-02 19:56                                             ` Vedran Furač
  -1 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-11-02 19:56 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> On Fri, 30 Oct 2009, Vedran Furac wrote:
> 
>> Well, you are kernel hacker, not me. You know how linux mm works much
>> more than I do. I just reported a, what I think is a big problem, which
>> needs to be solved ASAP (2.6.33).
> 
> The oom killer heuristics have not been changed recently, why is this 
> suddenly a problem that needs to be immediately addressed?  The heuristics 
> you've been referring to have been used for at least three years.

It isn't "suddenly a problem", but only a problem, big long time
problem. If it is three years old, then it should have been addressed
asap three years ago (and we would not need to talk about it now,
hopefully).

> However, I don't think we can simply change the baseline (like the rss 
> change which has been added to -mm (??)) and consider it a major 
> improvement when it severely impacts how system administrators are able to 
> tune the badness heuristic from userspace via /proc/pid/oom_adj.  I'm sure 
> you'd agree that user input is important in this matter and so that we 
> should maximize that ability rather than make it more difficult.  That's 
> my main criticism of the suggestions thus far (and, sorry, but I have to 
> look out for production server interests here: you can't take away our 
> ability to influence oom badness scoring just because other simple 
> heuristics may be more understandable).
> 
> What would be better, and what I think we'll end up with, is a root 
> selectable heuristic so that production servers and desktop machines can 
> use different heuristics to make oom kill selections.  We already have 
> /proc/sys/vm/oom_kill_allocating_task which I added 1-2 years ago to 
> address concerns specifically of SGI and their enormously long tasklist 
> scans.  This would be variation on that idea and would include different 
> simplistic behaviors (such as always killing the most memory hogging task, 
> killing the most recently started task by the same uid, etc), and leave 
> the default heuristic much the same as currently.

OK, agreed. Did you take a look at the set of patches Kame sent today?

Regards,

Vedran

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-11-02 19:56                                             ` Vedran Furač
  0 siblings, 0 replies; 128+ messages in thread
From: Vedran Furač @ 2009-11-02 19:56 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli

David Rientjes wrote:

> On Fri, 30 Oct 2009, Vedran Furac wrote:
> 
>> Well, you are kernel hacker, not me. You know how linux mm works much
>> more than I do. I just reported a, what I think is a big problem, which
>> needs to be solved ASAP (2.6.33).
> 
> The oom killer heuristics have not been changed recently, why is this 
> suddenly a problem that needs to be immediately addressed?  The heuristics 
> you've been referring to have been used for at least three years.

It isn't "suddenly a problem", but only a problem, big long time
problem. If it is three years old, then it should have been addressed
asap three years ago (and we would not need to talk about it now,
hopefully).

> However, I don't think we can simply change the baseline (like the rss 
> change which has been added to -mm (??)) and consider it a major 
> improvement when it severely impacts how system administrators are able to 
> tune the badness heuristic from userspace via /proc/pid/oom_adj.  I'm sure 
> you'd agree that user input is important in this matter and so that we 
> should maximize that ability rather than make it more difficult.  That's 
> my main criticism of the suggestions thus far (and, sorry, but I have to 
> look out for production server interests here: you can't take away our 
> ability to influence oom badness scoring just because other simple 
> heuristics may be more understandable).
> 
> What would be better, and what I think we'll end up with, is a root 
> selectable heuristic so that production servers and desktop machines can 
> use different heuristics to make oom kill selections.  We already have 
> /proc/sys/vm/oom_kill_allocating_task which I added 1-2 years ago to 
> address concerns specifically of SGI and their enormously long tasklist 
> scans.  This would be variation on that idea and would include different 
> simplistic behaviors (such as always killing the most memory hogging task, 
> killing the most recently started task by the same uid, etc), and leave 
> the default heuristic much the same as currently.

OK, agreed. Did you take a look at the set of patches Kame sent today?

Regards,

Vedran

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-27 20:44                 ` Hugh Dickins
@ 2009-10-28  0:43                   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-28  0:43 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: vedran.furac, linux-mm, linux-kernel, kosaki.motohiro,
	minchan.kim, akpm, rientjes, aarcange

On Tue, 27 Oct 2009 20:44:16 +0000 (GMT)
Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:

> On Tue, 27 Oct 2009, KAMEZAWA Hiroyuki wrote:
> > Sigh, gnome-session has twice value of mmap(1G).
> > Of course, gnome-session only uses 6M bytes of anon.
> > I wonder this is because gnome-session has many children..but need to
> > dig more. Does anyone has idea ?
> 
> When preparing KSM unmerge to handle OOM, I looked at how the precedent
> was handled by running a little program which mmaps an anonymous region
> of the same size as physical memory, then tries to mlock it.  The
> program was such an obvious candidate to be killed, I was shocked
> by the poor decisions the OOM killer made.  Usually I ran it with
> mem=512M, with gnome and firefox active.  Often the OOM killer killed
> it right the first time, but went wrong when I tried it a second time
> (I think that's because of what's already swapped out the first time).
> 
> I built up a patchset of fixes, but once I came to split them up for
> submission, not one of them seemed entirely satisfactory; and Andrea's
> fix to the KSM/mlock deadlock forced me to abandon even the first of
> the patches (we've since then fixed the way munlocking behaves, so
> in theory could revisit that; but Andrea disliked what I was trying
> to do there in KSM for other reasons, so I've not touched it since).
> I had to get on with KSM, so I set it all aside: none of the issues
> was a recent regression.
> 
> I did briefly wonder about the reliance on total_vm which you're now
> looking into, but didn't touch that at all.  Let me describe those
> issues which I did try but fail to fix - I've no more time to deal
> with them now than then, but ought at least to mention them to you.
> 
Okay, thank you for detailed information.


> 1.  select_bad_process() tries to avoid killing another process while
> there's still a TIF_MEMDIE, but its loop starts by skipping !p->mm
> processes.  However, p->mm is set to NULL well before p reaches
> exit_mmap() to actually free the memory, and there may be significant
> delays in between (I think exit_robust_list() gave me a hang at one
> stage).  So in practice, even when the OOM killer selects the right
> process to kill, there can be lots of collateral damage from it not
> waiting long enough for that process to give up its memory.
> 
Hmm.

> I tried to deal with that by moving the TIF_MEMDIE test up before
> the p->mm test, but adding in a check on p->exit_state:
> 		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
> 		    !p->exit_state)
> 			return ERR_PTR(-1UL);
> But this is then liable to hang the system if there's some reason
> why the selected process cannot proceed to free its memory (e.g.
> the current KSM unmerge case).  It needs to wait "a while", but
> give up if no progress is made, instead of hanging: originally
> I thought that setting PF_MEMALLOC more widely in page_alloc.c,
> and giving up on the TIF_MEMDIE if it was waiting in PF_MEMALLOC,
> would deal with that; but we cannot be sure that waiting of memory
> is the only reason for a holdup there (in the KSM unmerge case it's
> waiting for an mmap_sem, and there may well be other such cases).
> 
ok, then, easy handling can't be a help.

> 2.  I started out running my mlock test program as root (later
> switched to use "ulimit -l unlimited" first).  But badness() reckons
> CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points;
> and CAP_SYS_RAWIO another reason to quarter your points: so running
> as root makes you sixteen times less likely to be killed.  Quartering
> is anyway debatable, but sixteenthing seems utterly excessive to me.
> 
I can't agree that part of heuristics, either.

> I moved the CAP_SYS_RAWIO test in with the others, so it does no
> more than quartering; but is quartering appropriate anyway?  I did
> wonder if I was right to be "subverting" the fine-grained CAPs in
> this way, but have since seen unrelated mail from one who knows
> better, implying they're something of a fantasy, that su and sudo
> are indeed what's used in the real world.  Maybe this patch was okay.
> 
ok.



> 3.  badness() has a comment above it which says:  
>  * 5) we try to kill the process the user expects us to kill, this
>  *    algorithm has been meticulously tuned to meet the principle
>  *    of least surprise ... (be careful when you change it)
> But Andrea's 2.6.11 86a4c6d9e2e43796bb362debd3f73c0e3b198efa (later
> refined by Kurt's 2.6.16 9827b781f20828e5ceb911b879f268f78fe90815)
> adds plenty of surprise there, by trying to factor children into the
> calculation.  Intended to deal with forkbombs, but any reasonable
> process whose purpose is to fork children (e.g. gnome-session)
> becomes very vulnerable.  And whereas badness() itself goes on to
> refine the total_vm points by various adjustments peculiar to the
> process in question, those refinements have been ignored when
> adding the child's total_vm/2.  (Andrea does remark that he'd
> rather have rewritten badness() from scratch.)
> 
> I tried to fix this by moving the PF_OOM_ORIGIN (was PF_SWAPOFF)
> part of the calculation up to select_bad_process(), making a
> solo_badness() function which makes all those adjustments to
> total_vm, then badness() itself a simple function adding half
> the children's solo_badness()es to the process' own solo_badness().
> But probably lots more needs doing - Andrea's rewrite?
> 
> 4.  In some cases those children are sharing exactly the same mm,
> yet its total_vm is being added again and again to the points:
> I had a nasty inner loop searching back to see if we'd already
> counted this mm (but then, what if the different tasks sharing
> the mm deserved different adjustments to the total_vm?).
> 
> 
> I hope these notes help someone towards a better solution
> (and be prepared to discover more on the way).  I agree with
> Vedran that the present behaviour is pretty unimpressive, and
> I'm puzzled as to how people can have been tinkering with
> oom_kill.c down the years without seeing any of this.
> 

Sorry, I usually don't use X on servers and almost all recent my OOM test
was done under memcg ;(
Thank you for your investigation. Maybe I'll need several steps.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28  0:43                   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-28  0:43 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: vedran.furac, linux-mm, linux-kernel, kosaki.motohiro,
	minchan.kim, akpm, rientjes, aarcange

On Tue, 27 Oct 2009 20:44:16 +0000 (GMT)
Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:

> On Tue, 27 Oct 2009, KAMEZAWA Hiroyuki wrote:
> > Sigh, gnome-session has twice value of mmap(1G).
> > Of course, gnome-session only uses 6M bytes of anon.
> > I wonder this is because gnome-session has many children..but need to
> > dig more. Does anyone has idea ?
> 
> When preparing KSM unmerge to handle OOM, I looked at how the precedent
> was handled by running a little program which mmaps an anonymous region
> of the same size as physical memory, then tries to mlock it.  The
> program was such an obvious candidate to be killed, I was shocked
> by the poor decisions the OOM killer made.  Usually I ran it with
> mem=512M, with gnome and firefox active.  Often the OOM killer killed
> it right the first time, but went wrong when I tried it a second time
> (I think that's because of what's already swapped out the first time).
> 
> I built up a patchset of fixes, but once I came to split them up for
> submission, not one of them seemed entirely satisfactory; and Andrea's
> fix to the KSM/mlock deadlock forced me to abandon even the first of
> the patches (we've since then fixed the way munlocking behaves, so
> in theory could revisit that; but Andrea disliked what I was trying
> to do there in KSM for other reasons, so I've not touched it since).
> I had to get on with KSM, so I set it all aside: none of the issues
> was a recent regression.
> 
> I did briefly wonder about the reliance on total_vm which you're now
> looking into, but didn't touch that at all.  Let me describe those
> issues which I did try but fail to fix - I've no more time to deal
> with them now than then, but ought at least to mention them to you.
> 
Okay, thank you for detailed information.


> 1.  select_bad_process() tries to avoid killing another process while
> there's still a TIF_MEMDIE, but its loop starts by skipping !p->mm
> processes.  However, p->mm is set to NULL well before p reaches
> exit_mmap() to actually free the memory, and there may be significant
> delays in between (I think exit_robust_list() gave me a hang at one
> stage).  So in practice, even when the OOM killer selects the right
> process to kill, there can be lots of collateral damage from it not
> waiting long enough for that process to give up its memory.
> 
Hmm.

> I tried to deal with that by moving the TIF_MEMDIE test up before
> the p->mm test, but adding in a check on p->exit_state:
> 		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
> 		    !p->exit_state)
> 			return ERR_PTR(-1UL);
> But this is then liable to hang the system if there's some reason
> why the selected process cannot proceed to free its memory (e.g.
> the current KSM unmerge case).  It needs to wait "a while", but
> give up if no progress is made, instead of hanging: originally
> I thought that setting PF_MEMALLOC more widely in page_alloc.c,
> and giving up on the TIF_MEMDIE if it was waiting in PF_MEMALLOC,
> would deal with that; but we cannot be sure that waiting of memory
> is the only reason for a holdup there (in the KSM unmerge case it's
> waiting for an mmap_sem, and there may well be other such cases).
> 
ok, then, easy handling can't be a help.

> 2.  I started out running my mlock test program as root (later
> switched to use "ulimit -l unlimited" first).  But badness() reckons
> CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points;
> and CAP_SYS_RAWIO another reason to quarter your points: so running
> as root makes you sixteen times less likely to be killed.  Quartering
> is anyway debatable, but sixteenthing seems utterly excessive to me.
> 
I can't agree that part of heuristics, either.

> I moved the CAP_SYS_RAWIO test in with the others, so it does no
> more than quartering; but is quartering appropriate anyway?  I did
> wonder if I was right to be "subverting" the fine-grained CAPs in
> this way, but have since seen unrelated mail from one who knows
> better, implying they're something of a fantasy, that su and sudo
> are indeed what's used in the real world.  Maybe this patch was okay.
> 
ok.



> 3.  badness() has a comment above it which says:  
>  * 5) we try to kill the process the user expects us to kill, this
>  *    algorithm has been meticulously tuned to meet the principle
>  *    of least surprise ... (be careful when you change it)
> But Andrea's 2.6.11 86a4c6d9e2e43796bb362debd3f73c0e3b198efa (later
> refined by Kurt's 2.6.16 9827b781f20828e5ceb911b879f268f78fe90815)
> adds plenty of surprise there, by trying to factor children into the
> calculation.  Intended to deal with forkbombs, but any reasonable
> process whose purpose is to fork children (e.g. gnome-session)
> becomes very vulnerable.  And whereas badness() itself goes on to
> refine the total_vm points by various adjustments peculiar to the
> process in question, those refinements have been ignored when
> adding the child's total_vm/2.  (Andrea does remark that he'd
> rather have rewritten badness() from scratch.)
> 
> I tried to fix this by moving the PF_OOM_ORIGIN (was PF_SWAPOFF)
> part of the calculation up to select_bad_process(), making a
> solo_badness() function which makes all those adjustments to
> total_vm, then badness() itself a simple function adding half
> the children's solo_badness()es to the process' own solo_badness().
> But probably lots more needs doing - Andrea's rewrite?
> 
> 4.  In some cases those children are sharing exactly the same mm,
> yet its total_vm is being added again and again to the points:
> I had a nasty inner loop searching back to see if we'd already
> counted this mm (but then, what if the different tasks sharing
> the mm deserved different adjustments to the total_vm?).
> 
> 
> I hope these notes help someone towards a better solution
> (and be prepared to discover more on the way).  I agree with
> Vedran that the present behaviour is pretty unimpressive, and
> I'm puzzled as to how people can have been tinkering with
> oom_kill.c down the years without seeing any of this.
> 

Sorry, I usually don't use X on servers and almost all recent my OOM test
was done under memcg ;(
Thank you for your investigation. Maybe I'll need several steps.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-27 20:44                 ` Hugh Dickins
@ 2009-10-28  2:47                   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 128+ messages in thread
From: KOSAKI Motohiro @ 2009-10-28  2:47 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, vedran.furac, linux-mm,
	linux-kernel, minchan.kim, akpm, rientjes, aarcange

> 2.  I started out running my mlock test program as root (later
> switched to use "ulimit -l unlimited" first).  But badness() reckons
> CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points;
> and CAP_SYS_RAWIO another reason to quarter your points: so running
> as root makes you sixteen times less likely to be killed.  Quartering
> is anyway debatable, but sixteenthing seems utterly excessive to me.
> 
> I moved the CAP_SYS_RAWIO test in with the others, so it does no
> more than quartering; but is quartering appropriate anyway?  I did
> wonder if I was right to be "subverting" the fine-grained CAPs in
> this way, but have since seen unrelated mail from one who knows
> better, implying they're something of a fantasy, that su and sudo
> are indeed what's used in the real world.  Maybe this patch was okay.

I agree quartering is debatable.
At least, killing quartering is worth for any user, and it can be push into -stable.




>From 27331555366c908a93c2cdd780b77e421869c5af Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Wed, 28 Oct 2009 11:28:39 +0900
Subject: [PATCH] oom: Mitigate suer-user's bonus of oom-score

Currently, badness calculation code of oom contemplate following bonus.
 - Super-user have quartering oom-score
 - CAP_SYS_RAWIO process (e.g. database) also have quartering oom-score

The problem is, Super-users have CAP_SYS_RAWIO too. Then, they have
sixteenthing bonus. it's obviously too excessive and meaningless.

This patch fixes it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |   13 +++++--------
 1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ea2147d..40d323d 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -152,18 +152,15 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
 	/*
 	 * Superuser processes are usually more important, so we make it
 	 * less likely that we kill those.
-	 */
-	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
-	    has_capability_noaudit(p, CAP_SYS_RESOURCE))
-		points /= 4;
-
-	/*
-	 * We don't want to kill a process with direct hardware access.
+	 *
+	 * Plus, We don't want to kill a process with direct hardware access.
 	 * Not only could that mess up the hardware, but usually users
 	 * tend to only have this flag set on applications they think
 	 * of as important.
 	 */
-	if (has_capability_noaudit(p, CAP_SYS_RAWIO))
+	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
+	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
+	    has_capability_noaudit(p, CAP_SYS_RAWIO))
 		points /= 4;
 
 	/*
-- 
1.6.2.5





^ permalink raw reply related	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28  2:47                   ` KOSAKI Motohiro
  0 siblings, 0 replies; 128+ messages in thread
From: KOSAKI Motohiro @ 2009-10-28  2:47 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, vedran.furac, linux-mm,
	linux-kernel, minchan.kim, akpm, rientjes, aarcange

> 2.  I started out running my mlock test program as root (later
> switched to use "ulimit -l unlimited" first).  But badness() reckons
> CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points;
> and CAP_SYS_RAWIO another reason to quarter your points: so running
> as root makes you sixteen times less likely to be killed.  Quartering
> is anyway debatable, but sixteenthing seems utterly excessive to me.
> 
> I moved the CAP_SYS_RAWIO test in with the others, so it does no
> more than quartering; but is quartering appropriate anyway?  I did
> wonder if I was right to be "subverting" the fine-grained CAPs in
> this way, but have since seen unrelated mail from one who knows
> better, implying they're something of a fantasy, that su and sudo
> are indeed what's used in the real world.  Maybe this patch was okay.

I agree quartering is debatable.
At least, killing quartering is worth for any user, and it can be push into -stable.




From 27331555366c908a93c2cdd780b77e421869c5af Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Wed, 28 Oct 2009 11:28:39 +0900
Subject: [PATCH] oom: Mitigate suer-user's bonus of oom-score

Currently, badness calculation code of oom contemplate following bonus.
 - Super-user have quartering oom-score
 - CAP_SYS_RAWIO process (e.g. database) also have quartering oom-score

The problem is, Super-users have CAP_SYS_RAWIO too. Then, they have
sixteenthing bonus. it's obviously too excessive and meaningless.

This patch fixes it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |   13 +++++--------
 1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ea2147d..40d323d 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -152,18 +152,15 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
 	/*
 	 * Superuser processes are usually more important, so we make it
 	 * less likely that we kill those.
-	 */
-	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
-	    has_capability_noaudit(p, CAP_SYS_RESOURCE))
-		points /= 4;
-
-	/*
-	 * We don't want to kill a process with direct hardware access.
+	 *
+	 * Plus, We don't want to kill a process with direct hardware access.
 	 * Not only could that mess up the hardware, but usually users
 	 * tend to only have this flag set on applications they think
 	 * of as important.
 	 */
-	if (has_capability_noaudit(p, CAP_SYS_RAWIO))
+	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
+	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
+	    has_capability_noaudit(p, CAP_SYS_RAWIO))
 		points /= 4;
 
 	/*
-- 
1.6.2.5




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28  2:47                   ` KOSAKI Motohiro
@ 2009-10-28  3:17                     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-28  3:17 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Hugh Dickins, vedran.furac, linux-mm, linux-kernel, minchan.kim,
	akpm, rientjes, aarcange

On Wed, 28 Oct 2009 11:47:55 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> > 2.  I started out running my mlock test program as root (later
> > switched to use "ulimit -l unlimited" first).  But badness() reckons
> > CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points;
> > and CAP_SYS_RAWIO another reason to quarter your points: so running
> > as root makes you sixteen times less likely to be killed.  Quartering
> > is anyway debatable, but sixteenthing seems utterly excessive to me.
> > 
> > I moved the CAP_SYS_RAWIO test in with the others, so it does no
> > more than quartering; but is quartering appropriate anyway?  I did
> > wonder if I was right to be "subverting" the fine-grained CAPs in
> > this way, but have since seen unrelated mail from one who knows
> > better, implying they're something of a fantasy, that su and sudo
> > are indeed what's used in the real world.  Maybe this patch was okay.
> 
> I agree quartering is debatable.
> At least, killing quartering is worth for any user, and it can be push into -stable.
> 
> 
> 
> 
> From 27331555366c908a93c2cdd780b77e421869c5af Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Wed, 28 Oct 2009 11:28:39 +0900
> Subject: [PATCH] oom: Mitigate suer-user's bonus of oom-score
> 
> Currently, badness calculation code of oom contemplate following bonus.
>  - Super-user have quartering oom-score
>  - CAP_SYS_RAWIO process (e.g. database) also have quartering oom-score
> 
> The problem is, Super-users have CAP_SYS_RAWIO too. Then, they have
> sixteenthing bonus. it's obviously too excessive and meaningless.
> 
> This patch fixes it.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

I'll pick this up to my series.

Thanks,
-Kame

> ---
>  mm/oom_kill.c |   13 +++++--------
>  1 files changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index ea2147d..40d323d 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -152,18 +152,15 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
>  	/*
>  	 * Superuser processes are usually more important, so we make it
>  	 * less likely that we kill those.
> -	 */
> -	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> -	    has_capability_noaudit(p, CAP_SYS_RESOURCE))
> -		points /= 4;
> -
> -	/*
> -	 * We don't want to kill a process with direct hardware access.
> +	 *
> +	 * Plus, We don't want to kill a process with direct hardware access.
>  	 * Not only could that mess up the hardware, but usually users
>  	 * tend to only have this flag set on applications they think
>  	 * of as important.
>  	 */
> -	if (has_capability_noaudit(p, CAP_SYS_RAWIO))
> +	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> +	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
> +	    has_capability_noaudit(p, CAP_SYS_RAWIO))
>  		points /= 4;
>  
>  	/*
> -- 
> 1.6.2.5
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28  3:17                     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 128+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-28  3:17 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Hugh Dickins, vedran.furac, linux-mm, linux-kernel, minchan.kim,
	akpm, rientjes, aarcange

On Wed, 28 Oct 2009 11:47:55 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> > 2.  I started out running my mlock test program as root (later
> > switched to use "ulimit -l unlimited" first).  But badness() reckons
> > CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points;
> > and CAP_SYS_RAWIO another reason to quarter your points: so running
> > as root makes you sixteen times less likely to be killed.  Quartering
> > is anyway debatable, but sixteenthing seems utterly excessive to me.
> > 
> > I moved the CAP_SYS_RAWIO test in with the others, so it does no
> > more than quartering; but is quartering appropriate anyway?  I did
> > wonder if I was right to be "subverting" the fine-grained CAPs in
> > this way, but have since seen unrelated mail from one who knows
> > better, implying they're something of a fantasy, that su and sudo
> > are indeed what's used in the real world.  Maybe this patch was okay.
> 
> I agree quartering is debatable.
> At least, killing quartering is worth for any user, and it can be push into -stable.
> 
> 
> 
> 
> From 27331555366c908a93c2cdd780b77e421869c5af Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Wed, 28 Oct 2009 11:28:39 +0900
> Subject: [PATCH] oom: Mitigate suer-user's bonus of oom-score
> 
> Currently, badness calculation code of oom contemplate following bonus.
>  - Super-user have quartering oom-score
>  - CAP_SYS_RAWIO process (e.g. database) also have quartering oom-score
> 
> The problem is, Super-users have CAP_SYS_RAWIO too. Then, they have
> sixteenthing bonus. it's obviously too excessive and meaningless.
> 
> This patch fixes it.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

I'll pick this up to my series.

Thanks,
-Kame

> ---
>  mm/oom_kill.c |   13 +++++--------
>  1 files changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index ea2147d..40d323d 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -152,18 +152,15 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
>  	/*
>  	 * Superuser processes are usually more important, so we make it
>  	 * less likely that we kill those.
> -	 */
> -	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> -	    has_capability_noaudit(p, CAP_SYS_RESOURCE))
> -		points /= 4;
> -
> -	/*
> -	 * We don't want to kill a process with direct hardware access.
> +	 *
> +	 * Plus, We don't want to kill a process with direct hardware access.
>  	 * Not only could that mess up the hardware, but usually users
>  	 * tend to only have this flag set on applications they think
>  	 * of as important.
>  	 */
> -	if (has_capability_noaudit(p, CAP_SYS_RAWIO))
> +	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> +	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
> +	    has_capability_noaudit(p, CAP_SYS_RAWIO))
>  		points /= 4;
>  
>  	/*
> -- 
> 1.6.2.5
> 
> 
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28  2:47                   ` KOSAKI Motohiro
@ 2009-10-28  4:12                     ` David Rientjes
  -1 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-28  4:12 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, vedran.furac, linux-mm,
	linux-kernel, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 28 Oct 2009, KOSAKI Motohiro wrote:

> I agree quartering is debatable.
> At least, killing quartering is worth for any user, and it can be push into -stable.
> 

Not sure where the -stable reference came from, I don't think this is a 
candidate.

> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index ea2147d..40d323d 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -152,18 +152,15 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
>  	/*
>  	 * Superuser processes are usually more important, so we make it
>  	 * less likely that we kill those.
> -	 */
> -	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> -	    has_capability_noaudit(p, CAP_SYS_RESOURCE))
> -		points /= 4;
> -
> -	/*
> -	 * We don't want to kill a process with direct hardware access.
> +	 *
> +	 * Plus, We don't want to kill a process with direct hardware access.
>  	 * Not only could that mess up the hardware, but usually users
>  	 * tend to only have this flag set on applications they think
>  	 * of as important.
>  	 */
> -	if (has_capability_noaudit(p, CAP_SYS_RAWIO))
> +	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> +	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
> +	    has_capability_noaudit(p, CAP_SYS_RAWIO))
>  		points /= 4;
>  
>  	/*

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28  4:12                     ` David Rientjes
  0 siblings, 0 replies; 128+ messages in thread
From: David Rientjes @ 2009-10-28  4:12 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Hugh Dickins, KAMEZAWA Hiroyuki, vedran.furac, linux-mm,
	linux-kernel, minchan.kim, Andrew Morton, Andrea Arcangeli

On Wed, 28 Oct 2009, KOSAKI Motohiro wrote:

> I agree quartering is debatable.
> At least, killing quartering is worth for any user, and it can be push into -stable.
> 

Not sure where the -stable reference came from, I don't think this is a 
candidate.

> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index ea2147d..40d323d 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -152,18 +152,15 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
>  	/*
>  	 * Superuser processes are usually more important, so we make it
>  	 * less likely that we kill those.
> -	 */
> -	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> -	    has_capability_noaudit(p, CAP_SYS_RESOURCE))
> -		points /= 4;
> -
> -	/*
> -	 * We don't want to kill a process with direct hardware access.
> +	 *
> +	 * Plus, We don't want to kill a process with direct hardware access.
>  	 * Not only could that mess up the hardware, but usually users
>  	 * tend to only have this flag set on applications they think
>  	 * of as important.
>  	 */
> -	if (has_capability_noaudit(p, CAP_SYS_RAWIO))
> +	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> +	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
> +	    has_capability_noaudit(p, CAP_SYS_RAWIO))
>  		points /= 4;
>  
>  	/*

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
  2009-10-28  4:12                     ` David Rientjes
@ 2009-10-28  8:10                       ` Hugh Dickins
  -1 siblings, 0 replies; 128+ messages in thread
From: Hugh Dickins @ 2009-10-28  8:10 UTC (permalink / raw)
  To: David Rientjes
  Cc: KOSAKI Motohiro, KAMEZAWA Hiroyuki, vedran.furac, linux-mm,
	linux-kernel, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 27 Oct 2009, David Rientjes wrote:
> 
> Not sure where the -stable reference came from, I don't think this is a 
> candidate.

I agree with David, this is only one little piece of a messy puzzle,
there's no good reason to rush this into -stable.

> > +	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> > +	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
> > +	    has_capability_noaudit(p, CAP_SYS_RAWIO))
> 
> Acked-by: David Rientjes <rientjes@google.com>

Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>

(as far as it goes: the whole thing of quartering badness here
because "we don't want to kill" and "important" is questionable;
but definitely much more open to argument both ways than sixteenthing).

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: Memory overcommit
@ 2009-10-28  8:10                       ` Hugh Dickins
  0 siblings, 0 replies; 128+ messages in thread
From: Hugh Dickins @ 2009-10-28  8:10 UTC (permalink / raw)
  To: David Rientjes
  Cc: KOSAKI Motohiro, KAMEZAWA Hiroyuki, vedran.furac, linux-mm,
	linux-kernel, minchan.kim, Andrew Morton, Andrea Arcangeli

On Tue, 27 Oct 2009, David Rientjes wrote:
> 
> Not sure where the -stable reference came from, I don't think this is a 
> candidate.

I agree with David, this is only one little piece of a messy puzzle,
there's no good reason to rush this into -stable.

> > +	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> > +	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
> > +	    has_capability_noaudit(p, CAP_SYS_RAWIO))
> 
> Acked-by: David Rientjes <rientjes@google.com>

Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>

(as far as it goes: the whole thing of quartering badness here
because "we don't want to kill" and "important" is questionable;
but definitely much more open to argument both ways than sixteenthing).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 128+ messages in thread

end of thread, other threads:[~2009-11-04  3:22 UTC | newest]

Thread overview: 128+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-09 22:00 Memory overcommit Tracy R Reed
2005-12-11  2:00 ` Kip Macy
2005-12-11 15:45   ` Keir Fraser
2005-12-11 19:59     ` Rik van Riel
2005-12-13 16:10       ` Keir Fraser
2005-12-13 16:25         ` Jacob Gorm Hansen
  -- strict thread matches above, loose matches on Subject: below --
2009-10-12 11:51 Vedran Furač
2009-10-13  3:08 ` KAMEZAWA Hiroyuki
2009-10-13 17:13   ` Vedran Furač
2009-10-14  4:51     ` KAMEZAWA Hiroyuki
2009-10-20 21:52       ` Vedran Furač
2009-10-26  1:55         ` KAMEZAWA Hiroyuki
2009-10-26 16:16           ` Vedran Furač
2009-10-26 16:16             ` Vedran Furač
2009-10-27  3:22             ` KAMEZAWA Hiroyuki
2009-10-27  3:22               ` KAMEZAWA Hiroyuki
2009-10-27  6:10               ` KOSAKI Motohiro
2009-10-27  6:10                 ` KOSAKI Motohiro
2009-10-27  6:34                 ` Minchan Kim
2009-10-27  6:34                   ` Minchan Kim
2009-10-27  6:36                   ` KAMEZAWA Hiroyuki
2009-10-27  6:36                     ` KAMEZAWA Hiroyuki
2009-10-27  6:55                     ` Minchan Kim
2009-10-27  6:55                       ` Minchan Kim
2009-10-27  6:46                   ` KOSAKI Motohiro
2009-10-27  6:46                     ` KOSAKI Motohiro
2009-10-27  6:56                     ` Minchan Kim
2009-10-27  6:56                       ` Minchan Kim
2009-10-27 17:12               ` Vedran Furač
2009-10-27 17:12                 ` Vedran Furač
2009-10-27 18:02                 ` KOSAKI Motohiro
2009-10-27 18:30                   ` Vedran Furač
2009-10-27 18:30                     ` Vedran Furač
2009-10-27 20:44               ` Hugh Dickins
2009-10-27 20:44                 ` Hugh Dickins
2009-10-27 21:04                 ` David Rientjes
2009-10-27 21:04                   ` David Rientjes
2009-10-28  0:08                   ` Vedran Furač
2009-10-28  0:08                     ` Vedran Furač
2009-10-28  0:25                     ` David Rientjes
2009-10-28  0:25                       ` David Rientjes
2009-10-28  0:39                       ` Vedran Furač
2009-10-28  0:39                         ` Vedran Furač
2009-10-28  4:08                         ` David Rientjes
2009-10-28  4:08                           ` David Rientjes
2009-10-28  4:55                           ` KAMEZAWA Hiroyuki
2009-10-28  4:55                             ` KAMEZAWA Hiroyuki
2009-10-28  5:13                             ` David Rientjes
2009-10-28  5:13                               ` David Rientjes
2009-10-28  6:05                               ` KAMEZAWA Hiroyuki
2009-10-28  6:05                                 ` KAMEZAWA Hiroyuki
2009-10-28  6:17                                 ` David Rientjes
2009-10-28  6:17                                   ` David Rientjes
2009-10-28  6:20                                   ` KAMEZAWA Hiroyuki
2009-10-28  6:20                                     ` KAMEZAWA Hiroyuki
2009-10-29  8:38                                     ` David Rientjes
2009-10-29  8:38                                       ` David Rientjes
2009-10-29 11:11                                       ` Vedran Furač
2009-10-29 11:11                                         ` Vedran Furač
2009-10-29 19:53                                         ` David Rientjes
2009-10-29 19:53                                           ` David Rientjes
2009-10-29 23:48                                           ` KAMEZAWA Hiroyuki
2009-10-29 23:48                                             ` KAMEZAWA Hiroyuki
2009-10-30  9:10                                             ` David Rientjes
2009-10-30  9:10                                               ` David Rientjes
2009-10-30  9:36                                               ` KAMEZAWA Hiroyuki
2009-10-30  9:36                                                 ` KAMEZAWA Hiroyuki
2009-10-30 10:49                                                 ` Thomas Fjellstrom
2009-11-03 20:49                                                 ` David Rientjes
2009-11-03 20:49                                                   ` David Rientjes
2009-11-04  0:50                                                   ` KAMEZAWA Hiroyuki
2009-11-04  0:50                                                     ` KAMEZAWA Hiroyuki
2009-11-04  1:58                                                     ` David Rientjes
2009-11-04  1:58                                                       ` David Rientjes
2009-11-04  2:17                                                       ` KAMEZAWA Hiroyuki
2009-11-04  2:17                                                         ` KAMEZAWA Hiroyuki
2009-11-04  3:10                                                         ` David Rientjes
2009-11-04  3:10                                                           ` David Rientjes
2009-11-04  3:19                                                           ` KAMEZAWA Hiroyuki
2009-11-04  3:19                                                             ` KAMEZAWA Hiroyuki
2009-10-30 13:59                                           ` Vedran Furač
2009-10-30 13:59                                             ` Vedran Furač
2009-10-30 19:24                                             ` David Rientjes
2009-10-30 19:24                                               ` David Rientjes
2009-11-02 19:58                                               ` Vedran Furač
2009-11-02 19:58                                                 ` Vedran Furač
2009-10-28 13:28                           ` Vedran Furač
2009-10-28 13:28                             ` Vedran Furač
2009-10-28 20:10                             ` David Rientjes
2009-10-28 20:10                               ` David Rientjes
2009-10-29  3:05                               ` Vedran Furač
2009-10-29  3:05                                 ` Vedran Furač
2009-10-29  8:35                                 ` David Rientjes
2009-10-29  8:35                                   ` David Rientjes
2009-10-29 11:01                                   ` Vedran Furač
2009-10-29 11:01                                     ` Vedran Furač
2009-10-29 19:42                                     ` David Rientjes
2009-10-29 19:42                                       ` David Rientjes
2009-10-30 13:53                                       ` Vedran Furač
2009-10-30 13:53                                         ` Vedran Furač
2009-10-30 14:08                                         ` Thomas Fjellstrom
2009-10-30 14:08                                           ` Thomas Fjellstrom
2009-10-30 15:13                                           ` Vedran Furač
2009-10-30 15:13                                             ` Vedran Furač
2009-10-30 14:12                                         ` Andrea Arcangeli
2009-10-30 14:12                                           ` Andrea Arcangeli
2009-10-30 14:41                                           ` Vedran Furač
2009-10-30 14:41                                             ` Vedran Furač
2009-10-30 15:15                                             ` Andrea Arcangeli
2009-10-30 15:15                                               ` Andrea Arcangeli
2009-10-30 16:24                                               ` Hugh Dickins
2009-10-30 16:24                                                 ` Hugh Dickins
2009-11-02 19:56                                               ` Vedran Furač
2009-11-02 19:56                                                 ` Vedran Furač
2009-10-30 19:44                                         ` David Rientjes
2009-10-30 19:44                                           ` David Rientjes
2009-11-02 19:56                                           ` Vedran Furač
2009-11-02 19:56                                             ` Vedran Furač
2009-10-28  0:43                 ` KAMEZAWA Hiroyuki
2009-10-28  0:43                   ` KAMEZAWA Hiroyuki
2009-10-28  2:47                 ` KOSAKI Motohiro
2009-10-28  2:47                   ` KOSAKI Motohiro
2009-10-28  3:17                   ` KAMEZAWA Hiroyuki
2009-10-28  3:17                     ` KAMEZAWA Hiroyuki
2009-10-28  4:12                   ` David Rientjes
2009-10-28  4:12                     ` David Rientjes
2009-10-28  8:10                     ` Hugh Dickins
2009-10-28  8:10                       ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.