* Re: OOM notifications [not found] <20071018201531.GA5938@dmt> @ 2007-10-26 21:02 ` Andrew Morton 2007-10-26 21:05 ` Martin Bligh 2007-10-28 21:16 ` Balbir Singh 0 siblings, 2 replies; 7+ messages in thread From: Andrew Morton @ 2007-10-26 21:02 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: linux-kernel, drepper, riel, Martin Bligh, linux-mm On Thu, 18 Oct 2007 16:15:31 -0400 Marcelo Tosatti <marcelo@kvack.org> wrote: > Hi, > > AIX contains the SIGDANGER signal to notify applications to free up some > unused cached memory: > > http://www.ussg.iu.edu/hypermail/linux/kernel/0007.0/0901.html > > There have been a few discussions on implementing such an idea on Linux, > but nothing concrete has been achieved. > > On the kernel side Rik suggested two notification points: "about to > swap" (for desktop scenarios) and "about to OOM" (for embedded-like > scenarios). > > With that assumption in mind it would be necessary to either have two > special devices for notification, or somehow indicate both events > through the same file descriptor. > > Comments are more than welcome. Martin was talking about some mad scheme wherin you'd create a bunch of pseudo files (say, /proc/foo/0, /proc/foo/1, ..., /proc/foo/9) and each one would become "ready" when the MM scanning priority reaches 10%, 20%, ... 100%. Obviously there would need to be a lot of abstraction to unhook a permanent userspace feature from a transient kernel implementation, but the basic idea is that a process which wants to know when the VM is getting into the orange zone would select() on the file "7" and a process which wants to know when the VM is getting into the red zone would select on file "9". It get more complicated with NUMA memory nodes and cgroup memory controllers. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: OOM notifications 2007-10-26 21:02 ` OOM notifications Andrew Morton @ 2007-10-26 21:05 ` Martin Bligh 2007-10-26 21:11 ` Andrew Morton 2007-10-28 21:16 ` Balbir Singh 1 sibling, 1 reply; 7+ messages in thread From: Martin Bligh @ 2007-10-26 21:05 UTC (permalink / raw) To: Andrew Morton; +Cc: Marcelo Tosatti, linux-kernel, drepper, riel, linux-mm Andrew Morton wrote: > On Thu, 18 Oct 2007 16:15:31 -0400 > Marcelo Tosatti <marcelo@kvack.org> wrote: > >> Hi, >> >> AIX contains the SIGDANGER signal to notify applications to free up some >> unused cached memory: >> >> http://www.ussg.iu.edu/hypermail/linux/kernel/0007.0/0901.html >> >> There have been a few discussions on implementing such an idea on Linux, >> but nothing concrete has been achieved. >> >> On the kernel side Rik suggested two notification points: "about to >> swap" (for desktop scenarios) and "about to OOM" (for embedded-like >> scenarios). >> >> With that assumption in mind it would be necessary to either have two >> special devices for notification, or somehow indicate both events >> through the same file descriptor. >> >> Comments are more than welcome. > > Martin was talking about some mad scheme wherin you'd create a bunch of > pseudo files (say, /proc/foo/0, /proc/foo/1, ..., /proc/foo/9) and each one > would become "ready" when the MM scanning priority reaches 10%, 20%, ... > 100%. > > Obviously there would need to be a lot of abstraction to unhook a permanent > userspace feature from a transient kernel implementation, but the basic > idea is that a process which wants to know when the VM is getting into the > orange zone would select() on the file "7" and a process which wants to > know when the VM is getting into the red zone would select on file "9". > > It get more complicated with NUMA memory nodes and cgroup memory > controllers. We ended up not doing that, but making a scanner that saw what percentage of the LRU was touched in the last n seconds, and printing that to userspace to deal with. Turns out priority is a horrible metric to use for this - it stays at default for ages, then falls off a cliff far too quickly to react to. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: OOM notifications 2007-10-26 21:05 ` Martin Bligh @ 2007-10-26 21:11 ` Andrew Morton 2007-10-26 21:35 ` Rik van Riel 0 siblings, 1 reply; 7+ messages in thread From: Andrew Morton @ 2007-10-26 21:11 UTC (permalink / raw) To: Martin Bligh; +Cc: marcelo, linux-kernel, drepper, riel, linux-mm On Fri, 26 Oct 2007 14:05:47 -0700 Martin Bligh <mbligh@mbligh.org> wrote: > > Martin was talking about some mad scheme wherin you'd create a bunch of > > pseudo files (say, /proc/foo/0, /proc/foo/1, ..., /proc/foo/9) and each one > > would become "ready" when the MM scanning priority reaches 10%, 20%, ... > > 100%. > > > > Obviously there would need to be a lot of abstraction to unhook a permanent > > userspace feature from a transient kernel implementation, but the basic > > idea is that a process which wants to know when the VM is getting into the > > orange zone would select() on the file "7" and a process which wants to > > know when the VM is getting into the red zone would select on file "9". > > > > It get more complicated with NUMA memory nodes and cgroup memory > > controllers. > > We ended up not doing that, but making a scanner that saw what > percentage of the LRU was touched in the last n seconds, and > printing that to userspace to deal with. > > Turns out priority is a horrible metric to use for this - it > stays at default for ages, then falls off a cliff far too > quickly to react to. Sure, but in terms of high-level userspace interface, being able to select() on a group of priority buckets (spread across different nodes, zones and cgroups) seems a lot more flexible than any signal-based approach we could come up with. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: OOM notifications 2007-10-26 21:11 ` Andrew Morton @ 2007-10-26 21:35 ` Rik van Riel 2007-10-26 21:59 ` Martin Bligh 0 siblings, 1 reply; 7+ messages in thread From: Rik van Riel @ 2007-10-26 21:35 UTC (permalink / raw) To: Andrew Morton; +Cc: Martin Bligh, marcelo, linux-kernel, drepper, linux-mm On Fri, 26 Oct 2007 14:11:12 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > Sure, but in terms of high-level userspace interface, being able to > select() on a group of priority buckets (spread across different > nodes, zones and cgroups) seems a lot more flexible than any > signal-based approach we could come up with. Absolutely, the process needs to be able to just poll or select on a file descriptor from the process main loop. I am not convinced that the magic of NUMA memory distribution and NUMA memory pressure should be visible to userspace. Due to the thundering herd problem we cannot wake up all of the processes that select on the filedescriptor at the same time anyway, so we can (later on) add NUMA magic to the process selection logic in the kernel to only wake up processes on the right NUMA nodes. The initial patch probably does not need that. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: OOM notifications 2007-10-26 21:35 ` Rik van Riel @ 2007-10-26 21:59 ` Martin Bligh 2007-10-26 22:30 ` Rik van Riel 0 siblings, 1 reply; 7+ messages in thread From: Martin Bligh @ 2007-10-26 21:59 UTC (permalink / raw) To: Rik van Riel; +Cc: Andrew Morton, marcelo, linux-kernel, drepper, linux-mm Rik van Riel wrote: > On Fri, 26 Oct 2007 14:11:12 -0700 > Andrew Morton <akpm@linux-foundation.org> wrote: > >> Sure, but in terms of high-level userspace interface, being able to >> select() on a group of priority buckets (spread across different >> nodes, zones and cgroups) seems a lot more flexible than any >> signal-based approach we could come up with. > > Absolutely, the process needs to be able to just poll or > select on a file descriptor from the process main loop. > > I am not convinced that the magic of NUMA memory distribution > and NUMA memory pressure should be visible to userspace. Due > to the thundering herd problem we cannot wake up all of the > processes that select on the filedescriptor at the same time > anyway, so we can (later on) add NUMA magic to the process > selection logic in the kernel to only wake up processes on > the right NUMA nodes. > > The initial patch probably does not need that. Depends if you're using cpusets or not, I think? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: OOM notifications 2007-10-26 21:59 ` Martin Bligh @ 2007-10-26 22:30 ` Rik van Riel 0 siblings, 0 replies; 7+ messages in thread From: Rik van Riel @ 2007-10-26 22:30 UTC (permalink / raw) To: Martin Bligh; +Cc: Andrew Morton, marcelo, linux-kernel, drepper, linux-mm On Fri, 26 Oct 2007 14:59:01 -0700 Martin Bligh <mbligh@mbligh.org> wrote: > Rik van Riel wrote: > > On Fri, 26 Oct 2007 14:11:12 -0700 > > Andrew Morton <akpm@linux-foundation.org> wrote: > > > >> Sure, but in terms of high-level userspace interface, being able to > >> select() on a group of priority buckets (spread across different > >> nodes, zones and cgroups) seems a lot more flexible than any > >> signal-based approach we could come up with. > > > > Absolutely, the process needs to be able to just poll or > > select on a file descriptor from the process main loop. > > > > I am not convinced that the magic of NUMA memory distribution > > and NUMA memory pressure should be visible to userspace. Due > > to the thundering herd problem we cannot wake up all of the > > processes that select on the filedescriptor at the same time > > anyway, so we can (later on) add NUMA magic to the process > > selection logic in the kernel to only wake up processes on > > the right NUMA nodes. > > > > The initial patch probably does not need that. > > Depends if you're using cpusets or not, I think? The kernel knows on which cpuset a process can run. The process itself may have been relocated to a different cpuset at runtime, without it even knowing. Because of that I think the magic of which process(es) to wake up when there is memory pressure in some NUMA node should live in the kernel. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: OOM notifications 2007-10-26 21:02 ` OOM notifications Andrew Morton 2007-10-26 21:05 ` Martin Bligh @ 2007-10-28 21:16 ` Balbir Singh 1 sibling, 0 replies; 7+ messages in thread From: Balbir Singh @ 2007-10-28 21:16 UTC (permalink / raw) To: Andrew Morton Cc: Marcelo Tosatti, linux-kernel, drepper, riel, Martin Bligh, linux-mm Andrew Morton wrote: > It get more complicated with NUMA memory nodes and cgroup memory > controllers. > At OLS this year, users wanted user space notification of OOM for cgroup memory controller. When a group is about to OOM, a notification can help an external application re-adjust memory limits across the system. Keeping some memory reserved for handling OOM, this scheme could be extended to handle global OOM conditions as well. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-10-28 21:16 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20071018201531.GA5938@dmt>
2007-10-26 21:02 ` OOM notifications Andrew Morton
2007-10-26 21:05 ` Martin Bligh
2007-10-26 21:11 ` Andrew Morton
2007-10-26 21:35 ` Rik van Riel
2007-10-26 21:59 ` Martin Bligh
2007-10-26 22:30 ` Rik van Riel
2007-10-28 21:16 ` Balbir Singh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).