From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linuxfoundation.org ([140.211.169.12]:48405 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751625AbcDJBGN (ORCPT ); Sat, 9 Apr 2016 21:06:13 -0400 Subject: Patch "mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage" has been added to the 4.5-stable tree To: hannes@cmpxchg.org, akpm@linux-foundation.org, gregkh@linuxfoundation.org, mhocko@suse.com, torvalds@linux-foundation.org, vdavydov@virtuozzo.com Cc: , From: Date: Sat, 09 Apr 2016 18:06:12 -0700 Message-ID: <146025037264207@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ANSI_X3.4-1968 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org List-ID: This is a note to let you know that I've just added the patch titled mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage to the 4.5-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: mm-memcontrol-reclaim-and-oom-kill-when-shrinking-memory.max-below-usage.patch and it can be found in the queue-4.5 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >>From b6e6edcfa40561e9c8abe5eecf1c96f8e5fd9c6f Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Thu, 17 Mar 2016 14:20:28 -0700 Subject: mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage From: Johannes Weiner commit b6e6edcfa40561e9c8abe5eecf1c96f8e5fd9c6f upstream. Setting the original memory.limit_in_bytes hardlimit is subject to a race condition when the desired value is below the current usage. The code tries a few times to first reclaim and then see if the usage has dropped to where we would like it to be, but there is no locking, and the workload is free to continue making new charges up to the old limit. Thus, attempting to shrink a workload relies on pure luck and hope that the workload happens to cooperate. To fix this in the cgroup2 memory.max knob, do it the other way round: set the limit first, then try enforcement. And if reclaim is not able to succeed, trigger OOM kills in the group. Keep going until the new limit is met, we run out of OOM victims and there's only unreclaimable memory left, or the task writing to memory.max is killed. This allows users to shrink groups reliably, and the behavior is consistent with what happens when new charges are attempted in excess of memory.max. Signed-off-by: Johannes Weiner Acked-by: Michal Hocko Cc: Vladimir Davydov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- Documentation/cgroup-v2.txt | 6 ++++++ mm/memcontrol.c | 38 ++++++++++++++++++++++++++++++++++---- 2 files changed, 40 insertions(+), 4 deletions(-) --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -1368,6 +1368,12 @@ system than killing the group. Otherwis limit this type of spillover and ultimately contain buggy or even malicious applications. +Setting the original memory.limit_in_bytes below the current usage was +subject to a race condition, where concurrent charges could cause the +limit setting to fail. memory.max on the other hand will first set the +limit to prevent new charges, and then reclaim and OOM kill until the +new limit is met - or the task writing to memory.max is killed. + The combined memory+swap accounting and limiting is replaced by real control over swap space. --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1262,7 +1262,7 @@ static unsigned long mem_cgroup_get_limi return limit; } -static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, +static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, int order) { struct oom_control oc = { @@ -1340,6 +1340,7 @@ static void mem_cgroup_out_of_memory(str } unlock: mutex_unlock(&oom_lock); + return chosen; } #if MAX_NUMNODES > 1 @@ -5088,6 +5089,8 @@ static ssize_t memory_max_write(struct k char *buf, size_t nbytes, loff_t off) { struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + unsigned int nr_reclaims = MEM_CGROUP_RECLAIM_RETRIES; + bool drained = false; unsigned long max; int err; @@ -5096,9 +5099,36 @@ static ssize_t memory_max_write(struct k if (err) return err; - err = mem_cgroup_resize_limit(memcg, max); - if (err) - return err; + xchg(&memcg->memory.limit, max); + + for (;;) { + unsigned long nr_pages = page_counter_read(&memcg->memory); + + if (nr_pages <= max) + break; + + if (signal_pending(current)) { + err = -EINTR; + break; + } + + if (!drained) { + drain_all_stock(memcg); + drained = true; + continue; + } + + if (nr_reclaims) { + if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max, + GFP_KERNEL, true)) + nr_reclaims--; + continue; + } + + mem_cgroup_events(memcg, MEMCG_OOM, 1); + if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) + break; + } memcg_wb_domain_size_changed(memcg); return nbytes; Patches currently in stable-queue which might be from hannes@cmpxchg.org are queue-4.5/mm-memcontrol-reclaim-when-shrinking-memory.high-below-usage.patch queue-4.5/mm-memcontrol-reclaim-and-oom-kill-when-shrinking-memory.max-below-usage.patch