* [PATCH] memcg: Provide knob for force OOM into the memcg @ 2014-12-16 13:25 Chintan Pandya 2014-12-16 13:39 ` Michal Hocko 2014-12-16 16:59 ` Johannes Weiner 0 siblings, 2 replies; 7+ messages in thread From: Chintan Pandya @ 2014-12-16 13:25 UTC (permalink / raw) To: mhocko, hannes, linux-mm, cgroups; +Cc: linux-kernel, Chintan Pandya We may want to use memcg to limit the total memory footprint of all the processes within the one group. This may lead to a situation where any arbitrary process cannot get migrated to that one memcg because its limits will be breached. Or, process can get migrated but even being most recently used process, it can get killed by in-cgroup OOM. To avoid such scenarios, provide a convenient knob by which we can forcefully trigger OOM and make a room for upcoming process. To trigger force OOM, $ echo 1 > /<memcg_path>/memory.force_oom Signed-off-by: Chintan Pandya <cpandya@codeaurora.org> --- mm/memcontrol.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ef91e85..4c68aa7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3305,6 +3305,30 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg) return 0; } +static int mem_cgroup_force_oom(struct cgroup *cont, unsigned int event) +{ + struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); + int ret; + + if (mem_cgroup_is_root(memcg)) + return -EINVAL; + + css_get(&memcg->css); + ret = mem_cgroup_handle_oom(memcg, GFP_KERNEL, 0); + css_put(&memcg->css); + + return ret; +} + +static int mem_cgroup_force_oom_write(struct cgroup *cgrp, + struct cftype *cft, u64 val) +{ + if (val > 1 || val < 1) + return -EINVAL; + + return mem_cgroup_force_oom(cgrp, 0); +} + static ssize_t mem_cgroup_force_empty_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) @@ -4442,6 +4466,11 @@ static struct cftype mem_cgroup_files[] = { .write = mem_cgroup_force_empty_write, }, { + .name = "force_oom", + .trigger = mem_cgroup_force_oom, + .write_u64 = mem_cgroup_force_oom_write, + }, + { .name = "use_hierarchy", .write_u64 = mem_cgroup_hierarchy_write, .read_u64 = mem_cgroup_hierarchy_read, -- Chintan Pandya QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] memcg: Provide knob for force OOM into the memcg 2014-12-16 13:25 [PATCH] memcg: Provide knob for force OOM into the memcg Chintan Pandya @ 2014-12-16 13:39 ` Michal Hocko [not found] ` <20141216133935.GK22914-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> 2014-12-16 16:59 ` Johannes Weiner 1 sibling, 1 reply; 7+ messages in thread From: Michal Hocko @ 2014-12-16 13:39 UTC (permalink / raw) To: Chintan Pandya; +Cc: hannes, linux-mm, cgroups, linux-kernel On Tue 16-12-14 18:55:35, Chintan Pandya wrote: > We may want to use memcg to limit the total memory > footprint of all the processes within the one group. > This may lead to a situation where any arbitrary > process cannot get migrated to that one memcg > because its limits will be breached. Or, process can > get migrated but even being most recently used > process, it can get killed by in-cgroup OOM. To > avoid such scenarios, provide a convenient knob > by which we can forcefully trigger OOM and make > a room for upcoming process. > > To trigger force OOM, > $ echo 1 > /<memcg_path>/memory.force_oom What would prevent another task deplete that memory shortly after you triggered OOM and end up in the same situation? E.g. while the moving task is migrating its charges to the new group... Why cannot you simply disable OOM killer in that memcg and handle it from userspace properly? > Signed-off-by: Chintan Pandya <cpandya@codeaurora.org> > --- > mm/memcontrol.c | 29 +++++++++++++++++++++++++++++ > 1 file changed, 29 insertions(+) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ef91e85..4c68aa7 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3305,6 +3305,30 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg) > return 0; > } > > +static int mem_cgroup_force_oom(struct cgroup *cont, unsigned int event) > +{ > + struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); > + int ret; > + > + if (mem_cgroup_is_root(memcg)) > + return -EINVAL; > + > + css_get(&memcg->css); > + ret = mem_cgroup_handle_oom(memcg, GFP_KERNEL, 0); > + css_put(&memcg->css); > + > + return ret; > +} > + > +static int mem_cgroup_force_oom_write(struct cgroup *cgrp, > + struct cftype *cft, u64 val) > +{ > + if (val > 1 || val < 1) > + return -EINVAL; > + > + return mem_cgroup_force_oom(cgrp, 0); > +} > + > static ssize_t mem_cgroup_force_empty_write(struct kernfs_open_file *of, > char *buf, size_t nbytes, > loff_t off) > @@ -4442,6 +4466,11 @@ static struct cftype mem_cgroup_files[] = { > .write = mem_cgroup_force_empty_write, > }, > { > + .name = "force_oom", > + .trigger = mem_cgroup_force_oom, > + .write_u64 = mem_cgroup_force_oom_write, > + }, > + { > .name = "use_hierarchy", > .write_u64 = mem_cgroup_hierarchy_write, > .read_u64 = mem_cgroup_hierarchy_read, > -- > Chintan Pandya > > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a > member of the Code Aurora Forum, hosted by The Linux Foundation > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20141216133935.GK22914-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>]
* Re: [PATCH] memcg: Provide knob for force OOM into the memcg [not found] ` <20141216133935.GK22914-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> @ 2014-12-16 22:33 ` David Rientjes 2014-12-17 11:47 ` Chintan Pandya 0 siblings, 1 reply; 7+ messages in thread From: David Rientjes @ 2014-12-16 22:33 UTC (permalink / raw) To: Michal Hocko Cc: Chintan Pandya, hannes-druUgvl0LCNAfugRpC6u6w, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Tue, 16 Dec 2014, Michal Hocko wrote: > > We may want to use memcg to limit the total memory > > footprint of all the processes within the one group. > > This may lead to a situation where any arbitrary > > process cannot get migrated to that one memcg > > because its limits will be breached. Or, process can > > get migrated but even being most recently used > > process, it can get killed by in-cgroup OOM. To > > avoid such scenarios, provide a convenient knob > > by which we can forcefully trigger OOM and make > > a room for upcoming process. > > > > To trigger force OOM, > > $ echo 1 > /<memcg_path>/memory.force_oom > > What would prevent another task deplete that memory shortly after you > triggered OOM and end up in the same situation? E.g. while the moving > task is migrating its charges to the new group... > > Why cannot you simply disable OOM killer in that memcg and handle it > from userspace properly? > The patch is introducing a mechanism to induce a kernel oom kill for a memcg hierarchy to make room for it in the new memcg, not disable the oom killer so the migration fails due to the lower limits. It doesn't have any basis since a SIGKILL coming from userspace should be considered the same as a kernel oom kill from the memcg perspective, i.e. the fatal_signal_pending() checks that allow charge bypass instead of a strict reliance on TIF_MEMDIE being set. It seems to be proposed as a shortcut so that the kernel will determine the best process to kill. That information is available to userspace so it should be able to just SIGKILL the desired process (either in the destination memcg or in the source memcg to allow deletion), so this functionality isn't needed in the kernel. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] memcg: Provide knob for force OOM into the memcg 2014-12-16 22:33 ` David Rientjes @ 2014-12-17 11:47 ` Chintan Pandya 0 siblings, 0 replies; 7+ messages in thread From: Chintan Pandya @ 2014-12-17 11:47 UTC (permalink / raw) To: David Rientjes; +Cc: Michal Hocko, hannes, linux-mm, cgroups, linux-kernel On 12/17/2014 04:03 AM, David Rientjes wrote: > On Tue, 16 Dec 2014, Michal Hocko wrote: > >>> We may want to use memcg to limit the total memory >>> footprint of all the processes within the one group. >>> This may lead to a situation where any arbitrary >>> process cannot get migrated to that one memcg >>> because its limits will be breached. Or, process can >>> get migrated but even being most recently used >>> process, it can get killed by in-cgroup OOM. To >>> avoid such scenarios, provide a convenient knob >>> by which we can forcefully trigger OOM and make >>> a room for upcoming process. >>> >>> To trigger force OOM, >>> $ echo 1> /<memcg_path>/memory.force_oom >> >> What would prevent another task deplete that memory shortly after you >> triggered OOM and end up in the same situation? E.g. while the moving >> task is migrating its charges to the new group... Idea was to trigger an OOM until we can migrate any particular process onto desired cgroup. >> >> Why cannot you simply disable OOM killer in that memcg and handle it >> from userspace properly? Well, this can be done it seems. Let me explore around this. Thanks for this suggestion. > It seems to be proposed as a shortcut so that the kernel will determine > the best process to kill. That information is available to userspace so > it should be able to just SIGKILL the desired process (either in the > destination memcg or in the source memcg to allow deletion), so this > functionality isn't needed in the kernel. Yes, this can be seen as a shortcut because we are off-loading some task-selection to be killed by OOM on kernel rather than userspace decides by itself. -- Chintan Pandya QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] memcg: Provide knob for force OOM into the memcg 2014-12-16 13:25 [PATCH] memcg: Provide knob for force OOM into the memcg Chintan Pandya 2014-12-16 13:39 ` Michal Hocko @ 2014-12-16 16:59 ` Johannes Weiner [not found] ` <20141216165922.GA30984-HTCKtW7iVlxqnrmGgq4/JMIURNUf+fel@public.gmane.org> 1 sibling, 1 reply; 7+ messages in thread From: Johannes Weiner @ 2014-12-16 16:59 UTC (permalink / raw) To: Chintan Pandya; +Cc: mhocko, linux-mm, cgroups, linux-kernel On Tue, Dec 16, 2014 at 06:55:35PM +0530, Chintan Pandya wrote: > We may want to use memcg to limit the total memory > footprint of all the processes within the one group. > This may lead to a situation where any arbitrary > process cannot get migrated to that one memcg > because its limits will be breached. Or, process can > get migrated but even being most recently used > process, it can get killed by in-cgroup OOM. To > avoid such scenarios, provide a convenient knob > by which we can forcefully trigger OOM and make > a room for upcoming process. Why do you move tasks around during runtime? Rather than scanning thousands or millions of page table entries to relocate a task and its private memory to another configuration domain, wouldn't it be easier to just keep the task in a dedicated cgroup and reconfigure that instead? There doesn't seem to be a strong usecase for charge migration that couldn't be solved by doing things slightly differently from userspace. Certainly not something that justifies the complexity that it adds to memcg model and it's synchronization requirements from VM hotpaths. Hence, I'm inclined to not add charge moving to version 2 of memcg. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20141216165922.GA30984-HTCKtW7iVlxqnrmGgq4/JMIURNUf+fel@public.gmane.org>]
* Re: [PATCH] memcg: Provide knob for force OOM into the memcg [not found] ` <20141216165922.GA30984-HTCKtW7iVlxqnrmGgq4/JMIURNUf+fel@public.gmane.org> @ 2014-12-17 12:11 ` Chintan Pandya 2014-12-19 21:15 ` Johannes Weiner 0 siblings, 1 reply; 7+ messages in thread From: Chintan Pandya @ 2014-12-17 12:11 UTC (permalink / raw) To: Johannes Weiner Cc: mhocko-AlSwsSmVLrQ, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA > Why do you move tasks around during runtime? Rather than scanning > thousands or millions of page table entries to relocate a task and its > private memory to another configuration domain, wouldn't it be easier to > just keep the task in a dedicated cgroup and reconfigure that instead? Your suggestion is good. But in specific cases, we may have no choice but to migrate. Take a case of an Android system where a process/app will never gets killed until there is really no scope of holding it any longer in RAM. So, when that process was running as a foreground process, it has to belong to a group which has no memory limit and cannot be killed. Now, when the same process goes into background and sits idle, it can be compressed and cached into some space in RAM. These cached processes are ever growing list and can be capped with some limit. Naturally, these processes belongs to different category and hence different cgroup which just controls such cached processes. > > There doesn't seem to be a strong usecase for charge migration that > couldn't be solved by doing things slightly differently from userspace. > Certainly not something that justifies the complexity that it adds to > memcg model and it's synchronization requirements from VM hotpaths. > Hence, I'm inclined to not add charge moving to version 2 of memcg. Do you say charge migration is discouraged at runtime ? Difficult to live with this limitation. -- Chintan Pandya QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] memcg: Provide knob for force OOM into the memcg 2014-12-17 12:11 ` Chintan Pandya @ 2014-12-19 21:15 ` Johannes Weiner 0 siblings, 0 replies; 7+ messages in thread From: Johannes Weiner @ 2014-12-19 21:15 UTC (permalink / raw) To: Chintan Pandya; +Cc: mhocko, linux-mm, cgroups, linux-kernel On Wed, Dec 17, 2014 at 05:41:29PM +0530, Chintan Pandya wrote: > > >Why do you move tasks around during runtime? Rather than scanning > >thousands or millions of page table entries to relocate a task and its > >private memory to another configuration domain, wouldn't it be easier to > >just keep the task in a dedicated cgroup and reconfigure that instead? > > Your suggestion is good. But in specific cases, we may have no choice but to > migrate. > > Take a case of an Android system where a process/app will never gets killed > until there is really no scope of holding it any longer in RAM. So, when > that process was running as a foreground process, it has to belong to a > group which has no memory limit and cannot be killed. Now, when the same > process goes into background and sits idle, it can be compressed and cached > into some space in RAM. These cached processes are ever growing list and can > be capped with some limit. Naturally, these processes belongs to different > category and hence different cgroup which just controls such cached > processes. This is a valid usecase that should supported, but there has to be a better way to do it then to move potentially hundreds of megabytes, page by page, during the task switch. It's a massive amount of work. But there are also several problematic design implications in moving memory along with a task. For one, moving a task involves two separate cgroups that can have an arbitrary number of controllers attached to them. If this complex operation fails, you have no idea which component is at fault - was it a generic cgroup problem, or does the move violate a rule in one of the controllers? And *if* it fails, what action do you take? Does the app simply remain a foreground app? Likewise, if you move the task but fail to migrate some pages, do you leave them behind in the foreground group where they are exempt from reclaim? Conflating task organization and resource control like this just isn't a good idea. Secondly, memory tracked through cgroups does not belong to the task, it belongs to the cgroup, and we are limited to unreliable heuristics when determining a task-page relationship. The pages that can't be attributed properly - like unmapped page cache - again will be left behind, polluting your reclaim-exempt foreground group. Another aspect is that the foreground domain is variable in size, but you are assigning it a minimum amount of space by statically limiting the background apps. If the foreground doesn't use that space, you end up killing cached apps for no reason and waste even more memory. Imagine you run a large app like Maps and then switch to a tiny note taking app to look something up. Now you're kicking Maps out of the cache for no reason. The last point is a much more generic problem. Static limits are not suited to efficiently partition a machine for realistic workloads. This is why version 2 will move away from the idea of static hard limits as the primary means of partitioning, and towards a model that has the user configure upper and lower boundaries for the expected workingset size of each group. Static limits will be relegated to failsafe measures and hard requirements. Setting a group's lower boundary will tell the kernel how much memory the group requires at a minimum to function properly, and the kernel will try to avoid reclaiming and OOM killing groups within their lower boundary at the expense of groups that are in excess of theirs. Your configuration can be rephrased using this: by putting all apps into their own groups, and setting the lower boundary to infinity for the foreground apps and to zero for the background apps, the kernel will always reclaim and OOM kill the background apps first. You get the same foreground app protection as before, but with several advantages. Firstly, it separates the task grouping of an app from memory policy, which allows you to track your apps as self-contained bundles of tasks and memory. You are no longer required to conflate unrelated apps for the sake of memory policy, only to reshuffle and break them apart again using inaccurate separation heuristics that will end up polluting *both* domains. Secondly, background apps will no longer get killed based on a static quota, but based on actual memory pressure. You configure the policy, and the kernel decides on demand where to get the required memory. And lastly, you don't have to physically move thousands of pages on every task switch anymore, AND pay the synchronization overhead that stems from pages changing cgroups during runtime. Your use case is valid, but charge migration doesn't seem to be the right answer here. And I really doubt it's ever the right answer. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-12-19 21:15 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-16 13:25 [PATCH] memcg: Provide knob for force OOM into the memcg Chintan Pandya
2014-12-16 13:39 ` Michal Hocko
[not found] ` <20141216133935.GK22914-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2014-12-16 22:33 ` David Rientjes
2014-12-17 11:47 ` Chintan Pandya
2014-12-16 16:59 ` Johannes Weiner
[not found] ` <20141216165922.GA30984-HTCKtW7iVlxqnrmGgq4/JMIURNUf+fel@public.gmane.org>
2014-12-17 12:11 ` Chintan Pandya
2014-12-19 21:15 ` Johannes Weiner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox