* [PATCH v2 0/2] mm, oom: do not grant oom victims full memory reserves access @ 2017-08-10 7:50 Michal Hocko 2017-08-10 7:50 ` [PATCH 1/2] mm, oom: do not rely on TIF_MEMDIE for " Michal Hocko 2017-08-10 7:50 ` [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim Michal Hocko 0 siblings, 2 replies; 11+ messages in thread From: Michal Hocko @ 2017-08-10 7:50 UTC (permalink / raw) To: Andrew Morton Cc: Mel Gorman, Tetsuo Handa, David Rientjes, Johannes Weiner, Roman Gushchin, linux-mm, LKML Hi, this is a second version of the series previously posted [1]. Tetsuo has noticed a bug I've introduced during the rebase and also pointed out reserves access changes on nommu arches which I have addressed as well. Mel had some more feedback which is hopefully addressed as well. There were no further comments so I am reposting and asking for inclusion. this is a part of a larger series I posted back in Oct last year [2]. I have dropped patch 3 because it was incorrect and patch 4 is not applicable without it. The primary reason to apply patch 1 is to remove a risk of the complete memory depletion by oom victims. While this is a theoretical risk right now there is a demand for memcg aware oom killer which might kill all processes inside a memcg which can be a lot of tasks. That would make the risk quite real. This issue is addressed by limiting access to memory reserves. We no longer use TIF_MEMDIE to grant the access and use tsk_is_oom_victim instead. See Patch 1 for more details. Patch 2 is a trivial follow up cleanup. I would still like to get rid of TIF_MEMDIE completely but I do not have time to do it now and it is not a pressing issue. [1] http://lkml.kernel.org/r/20170727090357.3205-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/20161004090009.7974-1-mhocko@kernel.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] mm, oom: do not rely on TIF_MEMDIE for memory reserves access 2017-08-10 7:50 [PATCH v2 0/2] mm, oom: do not grant oom victims full memory reserves access Michal Hocko @ 2017-08-10 7:50 ` Michal Hocko 2017-08-10 7:50 ` [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim Michal Hocko 1 sibling, 0 replies; 11+ messages in thread From: Michal Hocko @ 2017-08-10 7:50 UTC (permalink / raw) To: Andrew Morton Cc: Mel Gorman, Tetsuo Handa, David Rientjes, Johannes Weiner, Roman Gushchin, linux-mm, LKML, Michal Hocko From: Michal Hocko <mhocko@suse.com> For ages we have been relying on TIF_MEMDIE thread flag to mark OOM victims and then, among other things, to give these threads full access to memory reserves. There are few shortcomings of this implementation, though. First of all and the most serious one is that the full access to memory reserves is quite dangerous because we leave no safety room for the system to operate and potentially do last emergency steps to move on. Secondly this flag is per task_struct while the OOM killer operates on mm_struct granularity so all processes sharing the given mm are killed. Giving the full access to all these task_structs could lead to a quick memory reserves depletion. We have tried to reduce this risk by giving TIF_MEMDIE only to the main thread and the currently allocating task but that doesn't really solve this problem while it surely opens up a room for corner cases - e.g. GFP_NO{FS,IO} requests might loop inside the allocator without access to memory reserves because a particular thread was not the group leader. Now that we have the oom reaper and that all oom victims are reapable after 1b51e65eab64 ("oom, oom_reaper: allow to reap mm shared by the kthreads") we can be more conservative and grant only partial access to memory reserves because there are reasonable chances of the parallel memory freeing. We still want some access to reserves because we do not want other consumers to eat up the victim's freed memory. oom victims will still contend with __GFP_HIGH users but those shouldn't be so aggressive to starve oom victims completely. Introduce ALLOC_OOM flag and give all tsk_is_oom_victim tasks access to the half of the reserves. This makes the access to reserves independent on which task has passed through mark_oom_victim. Also drop any usage of TIF_MEMDIE from the page allocator proper and replace it by tsk_is_oom_victim as well which will make page_alloc.c completely TIF_MEMDIE free finally. CONFIG_MMU=n doesn't have oom reaper so let's stick to the original ALLOC_NO_WATERMARKS approach. There is a demand to make the oom killer memcg aware which will imply many tasks killed at once. This change will allow such a usecase without worrying about complete memory reserves depletion. Changes since v1 - do not play tricks with nommu and grant access to memory reserves as long as TIF_MEMDIE is set - break out from allocation properly for oom victims as per Tetsuo - distinguish oom victims from other consumers of memory reserves in __gfp_pfmemalloc_flags - per Tetsuo - clarify access to memory reserves in __zone_watermark_ok - per Mel - make int->bool conversion in gfp_pfmemalloc_allowed more robust - per Mel - s@reserves@reserve_flags@ in __alloc_pages_slowpath - per Mel Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> --- mm/internal.h | 11 +++++++++ mm/oom_kill.c | 9 ++++--- mm/page_alloc.c | 76 ++++++++++++++++++++++++++++++++++++++++++--------------- 3 files changed, 73 insertions(+), 23 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 24d88f084705..1ebcb1ed01b5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -480,6 +480,17 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, /* Mask to get the watermark bits */ #define ALLOC_WMARK_MASK (ALLOC_NO_WATERMARKS-1) +/* + * Only MMU archs have async oom victim reclaim - aka oom_reaper so we + * cannot assume a reduced access to memory reserves is sufficient for + * !MMU + */ +#ifdef CONFIG_MMU +#define ALLOC_OOM 0x08 +#else +#define ALLOC_OOM ALLOC_NO_WATERMARKS +#endif + #define ALLOC_HARDER 0x10 /* try to alloc harder */ #define ALLOC_HIGH 0x20 /* __GFP_HIGH set */ #define ALLOC_CPUSET 0x40 /* check for correct cpuset */ diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 9e8b4f030c1c..c9f3569a76c7 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -824,7 +824,8 @@ static void oom_kill_process(struct oom_control *oc, const char *message) /* * If the task is already exiting, don't alarm the sysadmin or kill - * its children or threads, just set TIF_MEMDIE so it can die quickly + * its children or threads, just give it access to memory reserves + * so it can die quickly */ task_lock(p); if (task_will_free_mem(p)) { @@ -889,9 +890,9 @@ static void oom_kill_process(struct oom_control *oc, const char *message) count_memcg_event_mm(mm, OOM_KILL); /* - * We should send SIGKILL before setting TIF_MEMDIE in order to prevent - * the OOM victim from depleting the memory reserves from the user - * space under its control. + * We should send SIGKILL before granting access to memory reserves + * in order to prevent the OOM victim from depleting the memory + * reserves from the user space under its control. */ do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true); mark_oom_victim(victim); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 80e4adb4c360..90e331e4c077 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2930,7 +2930,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, { long min = mark; int o; - const bool alloc_harder = (alloc_flags & ALLOC_HARDER); + const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM)); /* free_pages may go negative - that's OK */ free_pages -= (1 << order) - 1; @@ -2943,10 +2943,21 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, * the high-atomic reserves. This will over-estimate the size of the * atomic reserve but it avoids a search. */ - if (likely(!alloc_harder)) + if (likely(!alloc_harder)) { free_pages -= z->nr_reserved_highatomic; - else - min -= min / 4; + } else { + /* + * OOM victims can try even harder than normal ALLOC_HARDER + * users on the grounds that it's definitely going to be in + * the exit path shortly and free memory. Any allocation it + * makes during the free path will be small and short-lived. + */ + if (alloc_flags & ALLOC_OOM) + min -= min / 2; + else + min -= min / 4; + } + #ifdef CONFIG_CMA /* If allocation can't use CMA areas don't use free CMA pages */ @@ -3184,7 +3195,7 @@ static void warn_alloc_show_mem(gfp_t gfp_mask, nodemask_t *nodemask) * of allowed nodes. */ if (!(gfp_mask & __GFP_NOMEMALLOC)) - if (test_thread_flag(TIF_MEMDIE) || + if (tsk_is_oom_victim(current) || (current->flags & (PF_MEMALLOC | PF_EXITING))) filter &= ~SHOW_MEM_FILTER_NODES; if (in_interrupt() || !(gfp_mask & __GFP_DIRECT_RECLAIM)) @@ -3603,21 +3614,46 @@ gfp_to_alloc_flags(gfp_t gfp_mask) return alloc_flags; } -bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) +static bool oom_reserves_allowed(struct task_struct *tsk) { - if (unlikely(gfp_mask & __GFP_NOMEMALLOC)) + if (!tsk_is_oom_victim(tsk)) + return false; + + /* + * !MMU doesn't have oom reaper so give access to memory reserves + * only to the thread with TIF_MEMDIE set + */ + if (!IS_ENABLED(CONFIG_MMU) && !test_thread_flag(TIF_MEMDIE)) return false; + return true; +} + +/* + * Distinguish requests which really need access to full memory + * reserves from oom victims which can live with a portion of it + */ +static inline int __gfp_pfmemalloc_flags(gfp_t gfp_mask) +{ + if (unlikely(gfp_mask & __GFP_NOMEMALLOC)) + return 0; if (gfp_mask & __GFP_MEMALLOC) - return true; + return ALLOC_NO_WATERMARKS; if (in_serving_softirq() && (current->flags & PF_MEMALLOC)) - return true; - if (!in_interrupt() && - ((current->flags & PF_MEMALLOC) || - unlikely(test_thread_flag(TIF_MEMDIE)))) - return true; + return ALLOC_NO_WATERMARKS; + if (!in_interrupt()) { + if (current->flags & PF_MEMALLOC) + return ALLOC_NO_WATERMARKS; + else if (oom_reserves_allowed(current)) + return ALLOC_OOM; + } - return false; + return 0; +} + +bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) +{ + return !!__gfp_pfmemalloc_flags(gfp_mask); } /* @@ -3770,6 +3806,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, unsigned long alloc_start = jiffies; unsigned int stall_timeout = 10 * HZ; unsigned int cpuset_mems_cookie; + int reserve_flags; /* * In the slowpath, we sanity check order to avoid ever trying to @@ -3875,15 +3912,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (gfp_mask & __GFP_KSWAPD_RECLAIM) wake_all_kswapds(order, ac); - if (gfp_pfmemalloc_allowed(gfp_mask)) - alloc_flags = ALLOC_NO_WATERMARKS; + reserve_flags = __gfp_pfmemalloc_flags(gfp_mask); + if (reserve_flags) + alloc_flags = reserve_flags; /* * Reset the zonelist iterators if memory policies can be ignored. * These allocations are high priority and system rather than user * orientated. */ - if (!(alloc_flags & ALLOC_CPUSET) || (alloc_flags & ALLOC_NO_WATERMARKS)) { + if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) { ac->zonelist = node_zonelist(numa_node_id(), gfp_mask); ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, ac->high_zoneidx, ac->nodemask); @@ -3960,8 +3998,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto got_pg; /* Avoid allocations with no watermarks from looping endlessly */ - if (test_thread_flag(TIF_MEMDIE) && - (alloc_flags == ALLOC_NO_WATERMARKS || + if (tsk_is_oom_victim(current) && + (alloc_flags == ALLOC_OOM || (gfp_mask & __GFP_NOMEMALLOC))) goto nopage; -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim 2017-08-10 7:50 [PATCH v2 0/2] mm, oom: do not grant oom victims full memory reserves access Michal Hocko 2017-08-10 7:50 ` [PATCH 1/2] mm, oom: do not rely on TIF_MEMDIE for " Michal Hocko @ 2017-08-10 7:50 ` Michal Hocko 1 sibling, 0 replies; 11+ messages in thread From: Michal Hocko @ 2017-08-10 7:50 UTC (permalink / raw) To: Andrew Morton Cc: Mel Gorman, Tetsuo Handa, David Rientjes, Johannes Weiner, Roman Gushchin, linux-mm, LKML, Michal Hocko From: Michal Hocko <mhocko@suse.com> TIF_MEMDIE is set only to the tasks whick were either directly selected by the OOM killer or passed through mark_oom_victim from the allocator path. tsk_is_oom_victim is more generic and allows to identify all tasks (threads) which share the mm with the oom victim. Please note that the freezer still needs to check TIF_MEMDIE because we cannot thaw tasks which do not participage in oom_victims counting otherwise a !TIF_MEMDIE task could interfere after oom_disbale returns. Changes since v1 - fix implicit declaration of function 'tsk_is_oom_victim' reported by 0day Signed-off-by: Michal Hocko <mhocko@suse.com> --- kernel/cgroup/cpuset.c | 9 +++++---- mm/memcontrol.c | 2 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index ca8376e5008c..734ae4fa9775 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -56,6 +56,7 @@ #include <linux/time64.h> #include <linux/backing-dev.h> #include <linux/sort.h> +#include <linux/oom.h> #include <linux/uaccess.h> #include <linux/atomic.h> @@ -2497,12 +2498,12 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs) * If we're in interrupt, yes, we can always allocate. If @node is set in * current's mems_allowed, yes. If it's not a __GFP_HARDWALL request and this * node is set in the nearest hardwalled cpuset ancestor to current's cpuset, - * yes. If current has access to memory reserves due to TIF_MEMDIE, yes. + * yes. If current has access to memory reserves as an oom victim, yes. * Otherwise, no. * * GFP_USER allocations are marked with the __GFP_HARDWALL bit, * and do not allow allocations outside the current tasks cpuset - * unless the task has been OOM killed as is marked TIF_MEMDIE. + * unless the task has been OOM killed. * GFP_KERNEL allocations are not so marked, so can escape to the * nearest enclosing hardwalled ancestor cpuset. * @@ -2525,7 +2526,7 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs) * affect that: * in_interrupt - any node ok (current task context irrelevant) * GFP_ATOMIC - any node ok - * TIF_MEMDIE - any node ok + * tsk_is_oom_victim - any node ok * GFP_KERNEL - any node in enclosing hardwalled cpuset ok * GFP_USER - only nodes in current tasks mems allowed ok. */ @@ -2543,7 +2544,7 @@ bool __cpuset_node_allowed(int node, gfp_t gfp_mask) * Allow tasks that have access to memory reserves because they have * been OOM killed to get memory anywhere. */ - if (unlikely(test_thread_flag(TIF_MEMDIE))) + if (unlikely(tsk_is_oom_victim(current))) return true; if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */ return false; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 544d47e5cbbd..86a48affb938 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * bypass the last charges so that they can exit quickly and * free their memory. */ - if (unlikely(test_thread_flag(TIF_MEMDIE) || + if (unlikely(tsk_is_oom_victim(current) || fatal_signal_pending(current) || current->flags & PF_EXITING)) goto force; -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 0/2] mm, oom: do not grant oom victims full memory reserves access @ 2017-07-27 9:03 Michal Hocko 2017-07-27 9:03 ` [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim Michal Hocko 0 siblings, 1 reply; 11+ messages in thread From: Michal Hocko @ 2017-07-27 9:03 UTC (permalink / raw) To: Andrew Morton Cc: David Rientjes, Johannes Weiner, Roman Gushchin, Tetsuo Handa, linux-mm, LKML Hi, this is a part of a larger series I posted back in Oct last year [1]. I have dropped patch 3 because it was incorrect and patch 4 is not applicable without it. The primary reason to apply patch 1 is to remove a risk of the complete memory depletion by oom victims. While this is a theoretical risk right now there is a demand for memcg aware oom killer which might kill all processes inside a memcg which can be a lot of tasks. That would make the risk quite real. This issue is addressed by limiting access to memory reserves. We no longer use TIF_MEMDIE to grant the access and use tsk_is_oom_victim instead. See Patch 1 for more details. Patch 2 is a trivial follow up cleanup. I would still like to get rid of TIF_MEMDIE completely but I do not have time to do it now and it is not a pressing issue. [1] http://lkml.kernel.org/r/20161004090009.7974-1-mhocko@kernel.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim 2017-07-27 9:03 [PATCH 0/2] mm, oom: do not grant oom victims full memory reserves access Michal Hocko @ 2017-07-27 9:03 ` Michal Hocko 2017-07-27 14:01 ` Tetsuo Handa 2017-07-29 8:33 ` kbuild test robot 0 siblings, 2 replies; 11+ messages in thread From: Michal Hocko @ 2017-07-27 9:03 UTC (permalink / raw) To: Andrew Morton Cc: David Rientjes, Johannes Weiner, Roman Gushchin, Tetsuo Handa, linux-mm, LKML, Michal Hocko From: Michal Hocko <mhocko@suse.com> TIF_MEMDIE is set only to the tasks whick were either directly selected by the OOM killer or passed through mark_oom_victim from the allocator path. tsk_is_oom_victim is more generic and allows to identify all tasks (threads) which share the mm with the oom victim. Please note that the freezer still needs to check TIF_MEMDIE because we cannot thaw tasks which do not participage in oom_victims counting otherwise a !TIF_MEMDIE task could interfere after oom_disbale returns. Signed-off-by: Michal Hocko <mhocko@suse.com> --- kernel/cgroup/cpuset.c | 8 ++++---- mm/memcontrol.c | 2 +- mm/oom_kill.c | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index ca8376e5008c..1cc53dff0d94 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2497,12 +2497,12 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs) * If we're in interrupt, yes, we can always allocate. If @node is set in * current's mems_allowed, yes. If it's not a __GFP_HARDWALL request and this * node is set in the nearest hardwalled cpuset ancestor to current's cpuset, - * yes. If current has access to memory reserves due to TIF_MEMDIE, yes. + * yes. If current has access to memory reserves as an oom victim, yes. * Otherwise, no. * * GFP_USER allocations are marked with the __GFP_HARDWALL bit, * and do not allow allocations outside the current tasks cpuset - * unless the task has been OOM killed as is marked TIF_MEMDIE. + * unless the task has been OOM killed. * GFP_KERNEL allocations are not so marked, so can escape to the * nearest enclosing hardwalled ancestor cpuset. * @@ -2525,7 +2525,7 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs) * affect that: * in_interrupt - any node ok (current task context irrelevant) * GFP_ATOMIC - any node ok - * TIF_MEMDIE - any node ok + * tsk_is_oom_victim - any node ok * GFP_KERNEL - any node in enclosing hardwalled cpuset ok * GFP_USER - only nodes in current tasks mems allowed ok. */ @@ -2543,7 +2543,7 @@ bool __cpuset_node_allowed(int node, gfp_t gfp_mask) * Allow tasks that have access to memory reserves because they have * been OOM killed to get memory anywhere. */ - if (unlikely(test_thread_flag(TIF_MEMDIE))) + if (unlikely(tsk_is_oom_victim(current))) return true; if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */ return false; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 544d47e5cbbd..86a48affb938 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * bypass the last charges so that they can exit quickly and * free their memory. */ - if (unlikely(test_thread_flag(TIF_MEMDIE) || + if (unlikely(tsk_is_oom_victim(current) || fatal_signal_pending(current) || current->flags & PF_EXITING)) goto force; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index c9f3569a76c7..65cc2f9aaa05 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -483,7 +483,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) * [...] * out_of_memory * select_bad_process - * # no TIF_MEMDIE task selects new victim + * # no TIF_MEMDIE, selects new victim * unmap_page_range # frees some memory */ mutex_lock(&oom_lock); -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim 2017-07-27 9:03 ` [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim Michal Hocko @ 2017-07-27 14:01 ` Tetsuo Handa 2017-07-27 14:08 ` Tetsuo Handa ` (2 more replies) 2017-07-29 8:33 ` kbuild test robot 1 sibling, 3 replies; 11+ messages in thread From: Tetsuo Handa @ 2017-07-27 14:01 UTC (permalink / raw) To: mhocko, akpm; +Cc: rientjes, hannes, guro, linux-mm, linux-kernel, mhocko Michal Hocko wrote: > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 544d47e5cbbd..86a48affb938 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > * bypass the last charges so that they can exit quickly and > * free their memory. > */ > - if (unlikely(test_thread_flag(TIF_MEMDIE) || > + if (unlikely(tsk_is_oom_victim(current) || > fatal_signal_pending(current) || > current->flags & PF_EXITING)) > goto force; Did we check http://lkml.kernel.org/r/20160909140508.GO4844@dhcp22.suse.cz ? > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index c9f3569a76c7..65cc2f9aaa05 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -483,7 +483,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) > * [...] > * out_of_memory > * select_bad_process > - * # no TIF_MEMDIE task selects new victim > + * # no TIF_MEMDIE, selects new victim > * unmap_page_range # frees some memory > */ > mutex_lock(&oom_lock); This comment is wrong. No MMF_OOM_SKIP mm selects new victim. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim 2017-07-27 14:01 ` Tetsuo Handa @ 2017-07-27 14:08 ` Tetsuo Handa 2017-07-27 14:18 ` Michal Hocko 2017-07-27 14:45 ` Michal Hocko 2 siblings, 0 replies; 11+ messages in thread From: Tetsuo Handa @ 2017-07-27 14:08 UTC (permalink / raw) To: mhocko, akpm; +Cc: rientjes, hannes, guro, linux-mm, linux-kernel, mhocko Tetsuo Handa wrote: > Michal Hocko wrote: > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 544d47e5cbbd..86a48affb938 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > > * bypass the last charges so that they can exit quickly and > > * free their memory. > > */ > > - if (unlikely(test_thread_flag(TIF_MEMDIE) || > > + if (unlikely(tsk_is_oom_victim(current) || > > fatal_signal_pending(current) || > > current->flags & PF_EXITING)) > > goto force; > > Did we check http://lkml.kernel.org/r/20160909140508.GO4844@dhcp22.suse.cz ? > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index c9f3569a76c7..65cc2f9aaa05 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -483,7 +483,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) > > * [...] > > * out_of_memory > > * select_bad_process > > - * # no TIF_MEMDIE task selects new victim > > + * # no TIF_MEMDIE, selects new victim > > * unmap_page_range # frees some memory > > */ > > mutex_lock(&oom_lock); > > This comment is wrong. No MMF_OOM_SKIP mm selects new victim. > Oops. "MMF_OOM_SKIP mm selects new victim." according to http://lkml.kernel.org/r/201706271952.FEB21375.SFJFHOQLOtVOMF@I-love.SAKURA.ne.jp . -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim 2017-07-27 14:01 ` Tetsuo Handa 2017-07-27 14:08 ` Tetsuo Handa @ 2017-07-27 14:18 ` Michal Hocko 2017-07-27 14:45 ` Michal Hocko 2 siblings, 0 replies; 11+ messages in thread From: Michal Hocko @ 2017-07-27 14:18 UTC (permalink / raw) To: Tetsuo Handa; +Cc: akpm, rientjes, hannes, guro, linux-mm, linux-kernel On Thu 27-07-17 23:01:05, Tetsuo Handa wrote: > Michal Hocko wrote: > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 544d47e5cbbd..86a48affb938 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > > * bypass the last charges so that they can exit quickly and > > * free their memory. > > */ > > - if (unlikely(test_thread_flag(TIF_MEMDIE) || > > + if (unlikely(tsk_is_oom_victim(current) || > > fatal_signal_pending(current) || > > current->flags & PF_EXITING)) > > goto force; > > Did we check http://lkml.kernel.org/r/20160909140508.GO4844@dhcp22.suse.cz ? I will double check. > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index c9f3569a76c7..65cc2f9aaa05 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -483,7 +483,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) > > * [...] > > * out_of_memory > > * select_bad_process > > - * # no TIF_MEMDIE task selects new victim > > + * # no TIF_MEMDIE, selects new victim > > * unmap_page_range # frees some memory > > */ > > mutex_lock(&oom_lock); > > This comment is wrong. No MMF_OOM_SKIP mm selects new victim. This hunk shouldn't make it to the patch. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim 2017-07-27 14:01 ` Tetsuo Handa 2017-07-27 14:08 ` Tetsuo Handa 2017-07-27 14:18 ` Michal Hocko @ 2017-07-27 14:45 ` Michal Hocko 2017-07-27 14:55 ` Roman Gushchin 2 siblings, 1 reply; 11+ messages in thread From: Michal Hocko @ 2017-07-27 14:45 UTC (permalink / raw) To: Tetsuo Handa; +Cc: akpm, rientjes, hannes, guro, linux-mm, linux-kernel On Thu 27-07-17 23:01:05, Tetsuo Handa wrote: > Michal Hocko wrote: > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 544d47e5cbbd..86a48affb938 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > > * bypass the last charges so that they can exit quickly and > > * free their memory. > > */ > > - if (unlikely(test_thread_flag(TIF_MEMDIE) || > > + if (unlikely(tsk_is_oom_victim(current) || > > fatal_signal_pending(current) || > > current->flags & PF_EXITING)) > > goto force; > > Did we check http://lkml.kernel.org/r/20160909140508.GO4844@dhcp22.suse.cz ? OK, so your concern was > Does this test_thread_flag(TIF_MEMDIE) (or tsk_is_oom_victim(current)) make sense? > > If current thread is OOM-killed, SIGKILL must be pending before arriving at > do_exit() and PF_EXITING must be set after arriving at do_exit(). > But I can't find locations which do memory allocation between clearing > SIGKILL and setting PF_EXITING. I can't find them either and maybe there are none. But why do we care in this particular patch which merely replaces TIF_MEMDIE check by tsk_is_oom_victim? The code will surely not become less valid. If you believe this check is redundant then send a patch with the clear justification. But I would say, at least from the robustness point of view I would just keep it there. We do not really have any control on what happens between clearing signals and setting PF_EXITING. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim 2017-07-27 14:45 ` Michal Hocko @ 2017-07-27 14:55 ` Roman Gushchin 0 siblings, 0 replies; 11+ messages in thread From: Roman Gushchin @ 2017-07-27 14:55 UTC (permalink / raw) To: Michal Hocko; +Cc: Tetsuo Handa, akpm, rientjes, hannes, linux-mm, linux-kernel On Thu, Jul 27, 2017 at 04:45:44PM +0200, Michal Hocko wrote: > On Thu 27-07-17 23:01:05, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index 544d47e5cbbd..86a48affb938 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > > > * bypass the last charges so that they can exit quickly and > > > * free their memory. > > > */ > > > - if (unlikely(test_thread_flag(TIF_MEMDIE) || > > > + if (unlikely(tsk_is_oom_victim(current) || > > > fatal_signal_pending(current) || > > > current->flags & PF_EXITING)) > > > goto force; > > > > Did we check http://lkml.kernel.org/r/20160909140508.GO4844@dhcp22.suse.cz ? > > OK, so your concern was > > > Does this test_thread_flag(TIF_MEMDIE) (or tsk_is_oom_victim(current)) make sense? > > > > If current thread is OOM-killed, SIGKILL must be pending before arriving at > > do_exit() and PF_EXITING must be set after arriving at do_exit(). > > > But I can't find locations which do memory allocation between clearing > > SIGKILL and setting PF_EXITING. > > I can't find them either and maybe there are none. But why do we care > in this particular patch which merely replaces TIF_MEMDIE check by > tsk_is_oom_victim? The code will surely not become less valid. If > you believe this check is redundant then send a patch with the clear > justification. But I would say, at least from the robustness point of > view I would just keep it there. We do not really have any control on > what happens between clearing signals and setting PF_EXITING. I agree, this check is probably redundant, but it really makes no difference, let's keep it bullet-proof. If we care about performance here, we can rearrange the checks: if (unlikely(fatal_signal_pending(current) || current->flags & PF_EXITING) || tsk_is_oom_victim(current)) goto force; Roman -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim 2017-07-27 9:03 ` [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim Michal Hocko 2017-07-27 14:01 ` Tetsuo Handa @ 2017-07-29 8:33 ` kbuild test robot 2017-07-31 6:46 ` Michal Hocko 1 sibling, 1 reply; 11+ messages in thread From: kbuild test robot @ 2017-07-29 8:33 UTC (permalink / raw) To: Michal Hocko Cc: kbuild-all, Andrew Morton, David Rientjes, Johannes Weiner, Roman Gushchin, Tetsuo Handa, linux-mm, LKML, Michal Hocko [-- Attachment #1: Type: text/plain, Size: 4865 bytes --] Hi Michal, [auto build test ERROR on cgroup/for-next] [also build test ERROR on v4.13-rc2 next-20170728] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Michal-Hocko/mm-oom-do-not-rely-on-TIF_MEMDIE-for-memory-reserves-access/20170728-101955 base: https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next config: i386-randconfig-c0-07291424 (attached as .config) compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): In file included from include/linux/ioport.h:12:0, from include/linux/device.h:16, from include/linux/node.h:17, from include/linux/cpu.h:16, from kernel/cgroup/cpuset.c:25: kernel/cgroup/cpuset.c: In function '__cpuset_node_allowed': >> include/linux/compiler.h:123:18: error: implicit declaration of function 'tsk_is_oom_victim' [-Werror=implicit-function-declaration] static struct ftrace_likely_data \ ^ include/linux/compiler.h:156:30: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^ kernel/cgroup/cpuset.c:2546:2: note: in expansion of macro 'if' if (unlikely(tsk_is_oom_victim(current))) ^ include/linux/compiler.h:146:24: note: in expansion of macro '__branch_check__' # define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x))) ^ kernel/cgroup/cpuset.c:2546:6: note: in expansion of macro 'unlikely' if (unlikely(tsk_is_oom_victim(current))) ^ cc1: some warnings being treated as errors -- In file included from include/linux/ioport.h:12:0, from include/linux/device.h:16, from include/linux/node.h:17, from include/linux/cpu.h:16, from kernel//cgroup/cpuset.c:25: kernel//cgroup/cpuset.c: In function '__cpuset_node_allowed': >> include/linux/compiler.h:123:18: error: implicit declaration of function 'tsk_is_oom_victim' [-Werror=implicit-function-declaration] static struct ftrace_likely_data \ ^ include/linux/compiler.h:156:30: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^ kernel//cgroup/cpuset.c:2546:2: note: in expansion of macro 'if' if (unlikely(tsk_is_oom_victim(current))) ^ include/linux/compiler.h:146:24: note: in expansion of macro '__branch_check__' # define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x))) ^ kernel//cgroup/cpuset.c:2546:6: note: in expansion of macro 'unlikely' if (unlikely(tsk_is_oom_victim(current))) ^ cc1: some warnings being treated as errors vim +/tsk_is_oom_victim +123 include/linux/compiler.h 1f0d69a9 Steven Rostedt 2008-11-12 120 d45ae1f7 Steven Rostedt (VMware 2017-01-17 121) #define __branch_check__(x, expect, is_constant) ({ \ 1f0d69a9 Steven Rostedt 2008-11-12 122 int ______r; \ 134e6a03 Steven Rostedt (VMware 2017-01-19 @123) static struct ftrace_likely_data \ 1f0d69a9 Steven Rostedt 2008-11-12 124 __attribute__((__aligned__(4))) \ 45b79749 Steven Rostedt 2008-11-21 125 __attribute__((section("_ftrace_annotated_branch"))) \ 1f0d69a9 Steven Rostedt 2008-11-12 126 ______f = { \ 134e6a03 Steven Rostedt (VMware 2017-01-19 127) .data.func = __func__, \ 134e6a03 Steven Rostedt (VMware 2017-01-19 128) .data.file = __FILE__, \ 134e6a03 Steven Rostedt (VMware 2017-01-19 129) .data.line = __LINE__, \ 1f0d69a9 Steven Rostedt 2008-11-12 130 }; \ d45ae1f7 Steven Rostedt (VMware 2017-01-17 131) ______r = __builtin_expect(!!(x), expect); \ d45ae1f7 Steven Rostedt (VMware 2017-01-17 132) ftrace_likely_update(&______f, ______r, \ d45ae1f7 Steven Rostedt (VMware 2017-01-17 133) expect, is_constant); \ 1f0d69a9 Steven Rostedt 2008-11-12 134 ______r; \ 1f0d69a9 Steven Rostedt 2008-11-12 135 }) 1f0d69a9 Steven Rostedt 2008-11-12 136 :::::: The code at line 123 was first introduced by commit :::::: 134e6a034cb004ed5acd3048792de70ced1c6cf5 tracing: Show number of constants profiled in likely profiler :::::: TO: Steven Rostedt (VMware) <rostedt@goodmis.org> :::::: CC: Steven Rostedt (VMware) <rostedt@goodmis.org> --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 28292 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim 2017-07-29 8:33 ` kbuild test robot @ 2017-07-31 6:46 ` Michal Hocko 0 siblings, 0 replies; 11+ messages in thread From: Michal Hocko @ 2017-07-31 6:46 UTC (permalink / raw) To: kbuild test robot Cc: kbuild-all, Andrew Morton, David Rientjes, Johannes Weiner, Roman Gushchin, Tetsuo Handa, linux-mm, LKML On Sat 29-07-17 16:33:35, kbuild test robot wrote: > Hi Michal, > > [auto build test ERROR on cgroup/for-next] > [also build test ERROR on v4.13-rc2 next-20170728] > [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] > > url: https://github.com/0day-ci/linux/commits/Michal-Hocko/mm-oom-do-not-rely-on-TIF_MEMDIE-for-memory-reserves-access/20170728-101955 > base: https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next > config: i386-randconfig-c0-07291424 (attached as .config) > compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4 > reproduce: > # save the attached .config to linux build tree > make ARCH=i386 > > All errors (new ones prefixed by >>): > > In file included from include/linux/ioport.h:12:0, > from include/linux/device.h:16, > from include/linux/node.h:17, > from include/linux/cpu.h:16, > from kernel/cgroup/cpuset.c:25: > kernel/cgroup/cpuset.c: In function '__cpuset_node_allowed': > >> include/linux/compiler.h:123:18: error: implicit declaration of function 'tsk_is_oom_victim' [-Werror=implicit-function-declaration] Thanks for the report. We need this --- commit 638b5ab1ed275f23b52a71941b66c8966d332cd7 Author: Michal Hocko <mhocko@suse.com> Date: Mon Jul 31 08:45:53 2017 +0200 fold me - fix implicit declaration of function 'tsk_is_oom_victim' reported by 0day Signed-off-by: Michal Hocko <mhocko@suse.com> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 1cc53dff0d94..734ae4fa9775 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -56,6 +56,7 @@ #include <linux/time64.h> #include <linux/backing-dev.h> #include <linux/sort.h> +#include <linux/oom.h> #include <linux/uaccess.h> #include <linux/atomic.h> -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-08-10 7:50 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-08-10 7:50 [PATCH v2 0/2] mm, oom: do not grant oom victims full memory reserves access Michal Hocko 2017-08-10 7:50 ` [PATCH 1/2] mm, oom: do not rely on TIF_MEMDIE for " Michal Hocko 2017-08-10 7:50 ` [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim Michal Hocko -- strict thread matches above, loose matches on Subject: below -- 2017-07-27 9:03 [PATCH 0/2] mm, oom: do not grant oom victims full memory reserves access Michal Hocko 2017-07-27 9:03 ` [PATCH 2/2] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim Michal Hocko 2017-07-27 14:01 ` Tetsuo Handa 2017-07-27 14:08 ` Tetsuo Handa 2017-07-27 14:18 ` Michal Hocko 2017-07-27 14:45 ` Michal Hocko 2017-07-27 14:55 ` Roman Gushchin 2017-07-29 8:33 ` kbuild test robot 2017-07-31 6:46 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).