* [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves @ 2015-11-11 13:48 mhocko 2015-11-11 15:54 ` Johannes Weiner 2015-11-22 12:55 ` Vlastimil Babka 0 siblings, 2 replies; 13+ messages in thread From: mhocko @ 2015-11-11 13:48 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, Andrea Arcangeli, Mel Gorman, David Rientjes, linux-mm, LKML, Michal Hocko From: Michal Hocko <mhocko@suse.com> __GFP_NOFAIL is a big hammer used to ensure that the allocation request can never fail. This is a strong requirement and as such it also deserves a special treatment when the system is OOM. The primary problem here is that the allocation request might have come with some locks held and the oom victim might be blocked on the same locks. This is basically an OOM deadlock situation. This patch tries to reduce the risk of such a deadlocks by giving __GFP_NOFAIL allocations a special treatment and let them dive into memory reserves after oom killer invocation. This should help them to make a progress and release resources they are holding. The OOM victim should compensate for the reserves consumption. Signed-off-by: Michal Hocko <mhocko@suse.com> --- Hi, this has been posted previously as a part of larger GFP_NOFS related patch set (http://lkml.kernel.org/r/1438768284-30927-1-git-send-email-mhocko%40kernel.org) but Andrea was asking basically the same thing at LSF early this year (I cannot seem to find it in any public archive though). I think the patch makes some sense on its own. Comments? mm/page_alloc.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8034909faad2..d30bce9d7ac8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2766,8 +2766,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, goto out; } /* Exhausted what can be done so it's blamo time */ - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) + if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { *did_some_progress = 1; + + if (gfp_mask & __GFP_NOFAIL) { + page = get_page_from_freelist(gfp_mask, order, + ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac); + WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation." + " Consider increasing min_free_kbytes.\n"); + } + } out: mutex_unlock(&oom_lock); return page; -- 2.6.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-11 13:48 [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves mhocko @ 2015-11-11 15:54 ` Johannes Weiner 2015-11-12 8:51 ` Michal Hocko 2015-11-22 12:55 ` Vlastimil Babka 1 sibling, 1 reply; 13+ messages in thread From: Johannes Weiner @ 2015-11-11 15:54 UTC (permalink / raw) To: mhocko Cc: Andrew Morton, Andrea Arcangeli, Mel Gorman, David Rientjes, linux-mm, LKML, Michal Hocko On Wed, Nov 11, 2015 at 02:48:17PM +0100, mhocko@kernel.org wrote: > From: Michal Hocko <mhocko@suse.com> > > __GFP_NOFAIL is a big hammer used to ensure that the allocation > request can never fail. This is a strong requirement and as such > it also deserves a special treatment when the system is OOM. The > primary problem here is that the allocation request might have > come with some locks held and the oom victim might be blocked > on the same locks. This is basically an OOM deadlock situation. > > This patch tries to reduce the risk of such a deadlocks by giving > __GFP_NOFAIL allocations a special treatment and let them dive into > memory reserves after oom killer invocation. This should help them > to make a progress and release resources they are holding. The OOM > victim should compensate for the reserves consumption. > > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > > Hi, > this has been posted previously as a part of larger GFP_NOFS related > patch set (http://lkml.kernel.org/r/1438768284-30927-1-git-send-email-mhocko%40kernel.org) > but Andrea was asking basically the same thing at LSF early this year > (I cannot seem to find it in any public archive though). I think the > patch makes some sense on its own. I sent this right after LSF based on Andrea's suggestion: https://lkml.org/lkml/2015/3/25/37 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-11 15:54 ` Johannes Weiner @ 2015-11-12 8:51 ` Michal Hocko 0 siblings, 0 replies; 13+ messages in thread From: Michal Hocko @ 2015-11-12 8:51 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Andrea Arcangeli, Mel Gorman, David Rientjes, linux-mm, LKML On Wed 11-11-15 10:54:46, Johannes Weiner wrote: > On Wed, Nov 11, 2015 at 02:48:17PM +0100, mhocko@kernel.org wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > __GFP_NOFAIL is a big hammer used to ensure that the allocation > > request can never fail. This is a strong requirement and as such > > it also deserves a special treatment when the system is OOM. The > > primary problem here is that the allocation request might have > > come with some locks held and the oom victim might be blocked > > on the same locks. This is basically an OOM deadlock situation. > > > > This patch tries to reduce the risk of such a deadlocks by giving > > __GFP_NOFAIL allocations a special treatment and let them dive into > > memory reserves after oom killer invocation. This should help them > > to make a progress and release resources they are holding. The OOM > > victim should compensate for the reserves consumption. > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > > > Hi, > > this has been posted previously as a part of larger GFP_NOFS related > > patch set (http://lkml.kernel.org/r/1438768284-30927-1-git-send-email-mhocko%40kernel.org) > > but Andrea was asking basically the same thing at LSF early this year > > (I cannot seem to find it in any public archive though). I think the > > patch makes some sense on its own. > > I sent this right after LSF based on Andrea's suggestion: > > https://lkml.org/lkml/2015/3/25/37 Ohh, I completely forgot as it was part of a larger series. Thanks for the pointer. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-11 13:48 [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves mhocko 2015-11-11 15:54 ` Johannes Weiner @ 2015-11-22 12:55 ` Vlastimil Babka 2015-11-23 9:29 ` Michal Hocko 1 sibling, 1 reply; 13+ messages in thread From: Vlastimil Babka @ 2015-11-22 12:55 UTC (permalink / raw) To: mhocko, Andrew Morton Cc: Johannes Weiner, Andrea Arcangeli, Mel Gorman, David Rientjes, linux-mm, LKML, Michal Hocko On 11.11.2015 14:48, mhocko@kernel.org wrote: > mm/page_alloc.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8034909faad2..d30bce9d7ac8 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2766,8 +2766,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > goto out; > } > /* Exhausted what can be done so it's blamo time */ > - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) > + if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { > *did_some_progress = 1; > + > + if (gfp_mask & __GFP_NOFAIL) { > + page = get_page_from_freelist(gfp_mask, order, > + ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac); > + WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation." > + " Consider increasing min_free_kbytes.\n"); It seems redundant to me to keep the WARN_ON_ONCE also above in the if () part? Also s/gfp_nofail/GFP_NOFAIL/ for consistency? Hm and probably out of scope of your patch, but I understand the WARN_ONCE (WARN_ON_ONCE) to be _ONCE just to prevent a flood from a single task looping here. But for distinct tasks and potentially far away in time, wouldn't we want to see all the warnings? Would that be feasible to implement? > + } > + } > out: > mutex_unlock(&oom_lock); > return page; > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-22 12:55 ` Vlastimil Babka @ 2015-11-23 9:29 ` Michal Hocko 2015-11-23 9:43 ` Vlastimil Babka 0 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2015-11-23 9:29 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Johannes Weiner, Andrea Arcangeli, Mel Gorman, David Rientjes, linux-mm, LKML On Sun 22-11-15 13:55:31, Vlastimil Babka wrote: > On 11.11.2015 14:48, mhocko@kernel.org wrote: > > mm/page_alloc.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 8034909faad2..d30bce9d7ac8 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2766,8 +2766,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > goto out; > > } > > /* Exhausted what can be done so it's blamo time */ > > - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) > > + if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { > > *did_some_progress = 1; > > + > > + if (gfp_mask & __GFP_NOFAIL) { > > + page = get_page_from_freelist(gfp_mask, order, > > + ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac); > > + WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation." > > + " Consider increasing min_free_kbytes.\n"); > > It seems redundant to me to keep the WARN_ON_ONCE also above in the if () part? They are warning about two different things. The first one catches a buggy code which uses __GFP_NOFAIL from oom disabled context while the second one tries to help the administrator with a hint that memory reserves are too small. > Also s/gfp_nofail/GFP_NOFAIL/ for consistency? Fair enough, changed. > Hm and probably out of scope of your patch, but I understand the WARN_ONCE > (WARN_ON_ONCE) to be _ONCE just to prevent a flood from a single task looping > here. But for distinct tasks and potentially far away in time, wouldn't we want > to see all the warnings? Would that be feasible to implement? I was thinking about that as well some time ago but it was quite hard to find a good enough API to tell when to warn again. The first WARN_ON_ONCE should trigger for all different _code paths_ no matter how frequently they appear to catch all the buggy callers. The second one would benefit from a new warning after min_free_kbytes was updated because it would tell the administrator that the last update was not sufficient for the workload. > > > + } > > + } > > out: > > mutex_unlock(&oom_lock); > > return page; > > Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-23 9:29 ` Michal Hocko @ 2015-11-23 9:43 ` Vlastimil Babka 2015-11-23 10:13 ` Michal Hocko 0 siblings, 1 reply; 13+ messages in thread From: Vlastimil Babka @ 2015-11-23 9:43 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Johannes Weiner, Andrea Arcangeli, Mel Gorman, David Rientjes, linux-mm, LKML On 11/23/2015 10:29 AM, Michal Hocko wrote: > On Sun 22-11-15 13:55:31, Vlastimil Babka wrote: >> On 11.11.2015 14:48, mhocko@kernel.org wrote: >>> mm/page_alloc.c | 10 +++++++++- >>> 1 file changed, 9 insertions(+), 1 deletion(-) >>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index 8034909faad2..d30bce9d7ac8 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -2766,8 +2766,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, >>> goto out; >>> } >>> /* Exhausted what can be done so it's blamo time */ >>> - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) >>> + if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { >>> *did_some_progress = 1; >>> + >>> + if (gfp_mask & __GFP_NOFAIL) { >>> + page = get_page_from_freelist(gfp_mask, order, >>> + ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac); >>> + WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation." >>> + " Consider increasing min_free_kbytes.\n"); >> >> It seems redundant to me to keep the WARN_ON_ONCE also above in the if () part? > > They are warning about two different things. The first one catches a > buggy code which uses __GFP_NOFAIL from oom disabled context while the Ah, I see, I misinterpreted what the return values of out_of_memory() mean. But now that I look at its code, it seems to only return false when oom_killer_disabled is set to true. Which is a global thing and nothing to do with the context of the __GFP_NOFAIL allocation? > second one tries to help the administrator with a hint that memory > reserves are too small. > >> Also s/gfp_nofail/GFP_NOFAIL/ for consistency? > > Fair enough, changed. > >> Hm and probably out of scope of your patch, but I understand the WARN_ONCE >> (WARN_ON_ONCE) to be _ONCE just to prevent a flood from a single task looping >> here. But for distinct tasks and potentially far away in time, wouldn't we want >> to see all the warnings? Would that be feasible to implement? > > I was thinking about that as well some time ago but it was quite > hard to find a good enough API to tell when to warn again. The first > WARN_ON_ONCE should trigger for all different _code paths_ no matter > how frequently they appear to catch all the buggy callers. The second > one would benefit from a new warning after min_free_kbytes was updated > because it would tell the administrator that the last update was not > sufficient for the workload. Hm, what about adding a flag to the struct alloc_context, so that when the particular allocation attempt emits the warning, it sets a flag in the alloc_context so that it won't emit them again as long as it keeps looping and attempting oom. Other allocations will warn independently. We could also print the same info as the "allocation failed" warnings do, since it's very similar, except we can't fail - but the admin/bug reporter should be interested in the same details as for an allocation failure that is allowed to fail. But it's also true that we have probably just printed the info during out_of_memory()... except when we skipped that for some reason? >> >>> + } >>> + } >>> out: >>> mutex_unlock(&oom_lock); >>> return page; >>> > > Thanks! > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-23 9:43 ` Vlastimil Babka @ 2015-11-23 10:13 ` Michal Hocko 2015-11-23 21:26 ` David Rientjes 0 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2015-11-23 10:13 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Johannes Weiner, Andrea Arcangeli, Mel Gorman, David Rientjes, linux-mm, LKML On Mon 23-11-15 10:43:42, Vlastimil Babka wrote: > On 11/23/2015 10:29 AM, Michal Hocko wrote: > >On Sun 22-11-15 13:55:31, Vlastimil Babka wrote: > >>On 11.11.2015 14:48, mhocko@kernel.org wrote: > >>> mm/page_alloc.c | 10 +++++++++- > >>> 1 file changed, 9 insertions(+), 1 deletion(-) > >>> > >>>diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>>index 8034909faad2..d30bce9d7ac8 100644 > >>>--- a/mm/page_alloc.c > >>>+++ b/mm/page_alloc.c > >>>@@ -2766,8 +2766,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > >>> goto out; > >>> } > >>> /* Exhausted what can be done so it's blamo time */ > >>>- if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) > >>>+ if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { > >>> *did_some_progress = 1; > >>>+ > >>>+ if (gfp_mask & __GFP_NOFAIL) { > >>>+ page = get_page_from_freelist(gfp_mask, order, > >>>+ ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac); > >>>+ WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation." > >>>+ " Consider increasing min_free_kbytes.\n"); > >> > >>It seems redundant to me to keep the WARN_ON_ONCE also above in the if () part? > > > >They are warning about two different things. The first one catches a > >buggy code which uses __GFP_NOFAIL from oom disabled context while the > > Ah, I see, I misinterpreted what the return values of out_of_memory() mean. > But now that I look at its code, it seems to only return false when > oom_killer_disabled is set to true. Which is a global thing and nothing to > do with the context of the __GFP_NOFAIL allocation? I am not sure I follow you here. The point of the warning is to warn when the oom killer is disbaled (out_of_memory returns false) _and_ the request is __GFP_NOFAIL because we simply cannot guarantee any forward progress and just a use of the allocation flag is not supproted. [...] > >>Hm and probably out of scope of your patch, but I understand the WARN_ONCE > >>(WARN_ON_ONCE) to be _ONCE just to prevent a flood from a single task looping > >>here. But for distinct tasks and potentially far away in time, wouldn't we want > >>to see all the warnings? Would that be feasible to implement? > > > >I was thinking about that as well some time ago but it was quite > >hard to find a good enough API to tell when to warn again. The first > >WARN_ON_ONCE should trigger for all different _code paths_ no matter > >how frequently they appear to catch all the buggy callers. The second > >one would benefit from a new warning after min_free_kbytes was updated > >because it would tell the administrator that the last update was not > >sufficient for the workload. > > Hm, what about adding a flag to the struct alloc_context, so that when the > particular allocation attempt emits the warning, it sets a flag in the > alloc_context so that it won't emit them again as long as it keeps looping > and attempting oom. Other allocations will warn independently. That could still trigger a flood of messages. Say you have many concurrent users from the same call path... I am not really sure making the code more complicating for this warning is really worth it. If anything we can use ratelimited variant. > We could also print the same info as the "allocation failed" warnings do, > since it's very similar, except we can't fail - but the admin/bug reporter > should be interested in the same details as for an allocation failure that > is allowed to fail. But it's also true that we have probably just printed > the info during out_of_memory()... except when we skipped that for some > reason? The first WARN_ON_ONCE happens when OOM killer doesn't trigger so a memory situation might be worth considering. The later one might have seen the OOM report which is the likely case. So if anyting the first one should dump the info. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-23 10:13 ` Michal Hocko @ 2015-11-23 21:26 ` David Rientjes 2015-11-24 9:47 ` Michal Hocko 0 siblings, 1 reply; 13+ messages in thread From: David Rientjes @ 2015-11-23 21:26 UTC (permalink / raw) To: Michal Hocko Cc: Vlastimil Babka, Andrew Morton, Johannes Weiner, Andrea Arcangeli, Mel Gorman, linux-mm, LKML On Mon, 23 Nov 2015, Michal Hocko wrote: > > >>>diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > >>>index 8034909faad2..d30bce9d7ac8 100644 > > >>>--- a/mm/page_alloc.c > > >>>+++ b/mm/page_alloc.c > > >>>@@ -2766,8 +2766,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > >>> goto out; > > >>> } > > >>> /* Exhausted what can be done so it's blamo time */ > > >>>- if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) > > >>>+ if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { > > >>> *did_some_progress = 1; > > >>>+ > > >>>+ if (gfp_mask & __GFP_NOFAIL) { > > >>>+ page = get_page_from_freelist(gfp_mask, order, > > >>>+ ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac); > > >>>+ WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation." > > >>>+ " Consider increasing min_free_kbytes.\n"); > > >> > > >>It seems redundant to me to keep the WARN_ON_ONCE also above in the if () part? > > > > > >They are warning about two different things. The first one catches a > > >buggy code which uses __GFP_NOFAIL from oom disabled context while the > > > > Ah, I see, I misinterpreted what the return values of out_of_memory() mean. > > But now that I look at its code, it seems to only return false when > > oom_killer_disabled is set to true. Which is a global thing and nothing to > > do with the context of the __GFP_NOFAIL allocation? > > I am not sure I follow you here. The point of the warning is to warn > when the oom killer is disbaled (out_of_memory returns false) _and_ the > request is __GFP_NOFAIL because we simply cannot guarantee any forward > progress and just a use of the allocation flag is not supproted. > I don't think the WARN_ONCE() above is helpful for a few reasons: - it suggests that min_free_kbytes is the best way to work around such issues and gives kernel developers a free pass to just say "raise min_free_kbytes" rather than reducing their reliance on __GFP_NOFAIL, - raising min_free_kbytes is not immediately actionable without memory freeing to fix any oom issue, and - it relies on the earlier warning to dump the state of memory and doesn't add any significant information to help understand how seperate occurrences are similar or different. I think the WARN_ONCE() should just be removed. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-23 21:26 ` David Rientjes @ 2015-11-24 9:47 ` Michal Hocko 2015-11-24 16:26 ` Johannes Weiner 0 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2015-11-24 9:47 UTC (permalink / raw) To: David Rientjes Cc: Vlastimil Babka, Andrew Morton, Johannes Weiner, Andrea Arcangeli, Mel Gorman, linux-mm, LKML On Mon 23-11-15 13:26:49, David Rientjes wrote: > On Mon, 23 Nov 2015, Michal Hocko wrote: [...] > > I am not sure I follow you here. The point of the warning is to warn > > when the oom killer is disbaled (out_of_memory returns false) _and_ the > > request is __GFP_NOFAIL because we simply cannot guarantee any forward > > progress and just a use of the allocation flag is not supproted. > > > > I don't think the WARN_ONCE() above is helpful for a few reasons: > > - it suggests that min_free_kbytes is the best way to work around such > issues and gives kernel developers a free pass to just say "raise > min_free_kbytes" rather than reducing their reliance on __GFP_NOFAIL, I disagree. Users are quite sensitive to warnings with backtraces in the log from my experience and they report them. And while the log shows the code path which triggers the issue which can help us to change the code it also gives a useful hint on how to reduce this issue until we are able to either fix a bug or a permanent configuration if we are not able to get rid of it for whatever reason. Besides that there is no other reliable warning that we are getting _really_ short on memory unlike when the allocation failure is allowed. OOM killer report might be missing because there was no actual killing happening. > - raising min_free_kbytes is not immediately actionable without memory > freeing to fix any oom issue, and true but it can be done to reduce chances for the issue to reappear. > - it relies on the earlier warning to dump the state of memory and > doesn't add any significant information to help understand how seperate > occurrences are similar or different. The information is quite valuable even without OOM killer report IMHO. > I think the WARN_ONCE() should just be removed. I do not insist on keeping it but I really think it might be useful while it doesn't seem to cause any confusion IMHO. So unless there is a strong reason to not include it I would prefer keeping it. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-24 9:47 ` Michal Hocko @ 2015-11-24 16:26 ` Johannes Weiner 2015-11-24 17:02 ` Michal Hocko 0 siblings, 1 reply; 13+ messages in thread From: Johannes Weiner @ 2015-11-24 16:26 UTC (permalink / raw) To: Michal Hocko Cc: David Rientjes, Vlastimil Babka, Andrew Morton, Andrea Arcangeli, Mel Gorman, linux-mm, LKML On Tue, Nov 24, 2015 at 10:47:09AM +0100, Michal Hocko wrote: > Besides that there is no other reliable warning that we are getting > _really_ short on memory unlike when the allocation failure is > allowed. OOM killer report might be missing because there was no actual > killing happening. This is why I would like to see that warning generalized, and not just for __GFP_NOFAIL. We have allocations other than explicit __GFP_NOFAIL that will loop forever in the allocator, and when this deadlocks the machine all we see is other tasks hanging, but not the culprit. If we were to get a backtrace of some task in the allocator that is known to hold locks, suddenly all the other hung tasks will make sense, and it will clearly distinguish such an allocator deadlock from other issues. Do you remember the patch you proposed at LSF about failing requests after looping a certain (configurable) number of times? Well, instead of failing them, it would be good to start WARNING after a certain # of loops when we know we won't quit (implicit or explicit NOFAIL). [ Kind of like fs/xfs/kmem::kmem_alloc() does, only that that is currently dead code due to our looping inside the allocator. ] -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-24 16:26 ` Johannes Weiner @ 2015-11-24 17:02 ` Michal Hocko 2015-11-24 19:57 ` Johannes Weiner 0 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2015-11-24 17:02 UTC (permalink / raw) To: Johannes Weiner Cc: David Rientjes, Vlastimil Babka, Andrew Morton, Andrea Arcangeli, Mel Gorman, linux-mm, LKML On Tue 24-11-15 11:26:04, Johannes Weiner wrote: > On Tue, Nov 24, 2015 at 10:47:09AM +0100, Michal Hocko wrote: > > Besides that there is no other reliable warning that we are getting > > _really_ short on memory unlike when the allocation failure is > > allowed. OOM killer report might be missing because there was no actual > > killing happening. > > This is why I would like to see that warning generalized, and not just > for __GFP_NOFAIL. We have allocations other than explicit __GFP_NOFAIL > that will loop forever in the allocator, Yes but does it make sense to warn for all of them? Wouldn't it be sufficient to warn about those which cannot allocate anything even though they are doing ALLOC_NO_WATERMARKS? We could still hint the administrator to increase min_free_kbytes for his particular workload. Such a situation should be really exceptional and warn_once should be sufficient. > and when this deadlocks the > machine all we see is other tasks hanging, but not the culprit. If we > were to get a backtrace of some task in the allocator that is known to > hold locks, suddenly all the other hung tasks will make sense, and it > will clearly distinguish such an allocator deadlock from other issues. Tetsuo was suggesting a more sophisticated infrastructure for tracking allocations [1] which take too long without making progress. I haven't seen his patch because I was too busy with other stuff but maybe this is what you would like to see? Anyway I would like to make some progress on this patch. Do you think that it would be acceptable in the current form without the warning or you preffer a different way? [1] http://lkml.kernel.org/r/201510182105.AGA00839.FHVFFStLQOMOOJ%40I-love.SAKURA.ne.jp -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-24 17:02 ` Michal Hocko @ 2015-11-24 19:57 ` Johannes Weiner 2015-11-25 9:33 ` Michal Hocko 0 siblings, 1 reply; 13+ messages in thread From: Johannes Weiner @ 2015-11-24 19:57 UTC (permalink / raw) To: Michal Hocko Cc: David Rientjes, Vlastimil Babka, Andrew Morton, Andrea Arcangeli, Mel Gorman, linux-mm, LKML On Tue, Nov 24, 2015 at 06:02:39PM +0100, Michal Hocko wrote: > On Tue 24-11-15 11:26:04, Johannes Weiner wrote: > > On Tue, Nov 24, 2015 at 10:47:09AM +0100, Michal Hocko wrote: > > > Besides that there is no other reliable warning that we are getting > > > _really_ short on memory unlike when the allocation failure is > > > allowed. OOM killer report might be missing because there was no actual > > > killing happening. > > > > This is why I would like to see that warning generalized, and not just > > for __GFP_NOFAIL. We have allocations other than explicit __GFP_NOFAIL > > that will loop forever in the allocator, > > Yes but does it make sense to warn for all of them? Wouldn't it be > sufficient to warn about those which cannot allocate anything even > though they are doing ALLOC_NO_WATERMARKS? Why is it important whether they can do ALLOC_NO_WATERMARKS or not? I'm worried about all those that can loop forever with locks held. > > and when this deadlocks the > > machine all we see is other tasks hanging, but not the culprit. If we > > were to get a backtrace of some task in the allocator that is known to > > hold locks, suddenly all the other hung tasks will make sense, and it > > will clearly distinguish such an allocator deadlock from other issues. > > Tetsuo was suggesting a more sophisticated infrastructure for tracking > allocations [1] which take too long without making progress. I haven't > seen his patch because I was too busy with other stuff but maybe this is > what you would like to see? That seems a bit excessive. I was thinking something more like this: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 05ef7fb..fbfc581 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3004,6 +3004,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, enum migrate_mode migration_mode = MIGRATE_ASYNC; bool deferred_compaction = false; int contended_compaction = COMPACT_CONTENDED_NONE; + unsigned int nr_tries = 0; /* * In the slowpath, we sanity check order to avoid ever trying to @@ -3033,6 +3034,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto nopage; retry: + if (++nr_retries % 100 == 0) + warn_alloc_failed(gfp_mask, order, "Potential GFP deadlock\n"); + if (gfp_mask & __GFP_KSWAPD_RECLAIM) wake_all_kswapds(order, ac); > Anyway I would like to make some progress on this patch. Do you think > that it would be acceptable in the current form without the warning or > you preffer a different way? Oh, I have nothing against your patch, please go ahead with it. I just wondered out loud when you proposed a warning about deadlocking NOFAIL allocations but limited it to explicit __GFP_NOFAIL allocations, when those obviously aren't the only ones that can deadlock in that way. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-24 19:57 ` Johannes Weiner @ 2015-11-25 9:33 ` Michal Hocko 0 siblings, 0 replies; 13+ messages in thread From: Michal Hocko @ 2015-11-25 9:33 UTC (permalink / raw) To: Johannes Weiner Cc: David Rientjes, Vlastimil Babka, Andrew Morton, Andrea Arcangeli, Mel Gorman, linux-mm, LKML On Tue 24-11-15 14:57:10, Johannes Weiner wrote: > On Tue, Nov 24, 2015 at 06:02:39PM +0100, Michal Hocko wrote: > > On Tue 24-11-15 11:26:04, Johannes Weiner wrote: > > > On Tue, Nov 24, 2015 at 10:47:09AM +0100, Michal Hocko wrote: > > > > Besides that there is no other reliable warning that we are getting > > > > _really_ short on memory unlike when the allocation failure is > > > > allowed. OOM killer report might be missing because there was no actual > > > > killing happening. > > > > > > This is why I would like to see that warning generalized, and not just > > > for __GFP_NOFAIL. We have allocations other than explicit __GFP_NOFAIL > > > that will loop forever in the allocator, > > > > Yes but does it make sense to warn for all of them? Wouldn't it be > > sufficient to warn about those which cannot allocate anything even > > though they are doing ALLOC_NO_WATERMARKS? > > Why is it important whether they can do ALLOC_NO_WATERMARKS or not? Well, the idea was that ALLOC_NO_WATERMARKS failures mean that memory reserves are not sufficient to handle given workload. min_free_kbytes is auto-tuned and it might be not sufficient - especially now that we are adding a new class of consumers of the reserves. I find a warning as an appropriate way to tell administrator that the auto-tuning was too optimistic for the particular load. > I'm worried about all those that can loop forever with locks held. I can see your point here but I think that looping endlessly without any progress is a different class of issue. It is hard to tune for it. You can change tunning and still can end up looping because the lock vs. reclaim dependencies will be still there. > > > and when this deadlocks the > > > machine all we see is other tasks hanging, but not the culprit. If we > > > were to get a backtrace of some task in the allocator that is known to > > > hold locks, suddenly all the other hung tasks will make sense, and it > > > will clearly distinguish such an allocator deadlock from other issues. > > > > Tetsuo was suggesting a more sophisticated infrastructure for tracking > > allocations [1] which take too long without making progress. I haven't > > seen his patch because I was too busy with other stuff but maybe this is > > what you would like to see? > > That seems a bit excessive. I was thinking something more like this: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 05ef7fb..fbfc581 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3004,6 +3004,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > enum migrate_mode migration_mode = MIGRATE_ASYNC; > bool deferred_compaction = false; > int contended_compaction = COMPACT_CONTENDED_NONE; > + unsigned int nr_tries = 0; > > /* > * In the slowpath, we sanity check order to avoid ever trying to > @@ -3033,6 +3034,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > goto nopage; > > retry: > + if (++nr_retries % 100 == 0) > + warn_alloc_failed(gfp_mask, order, "Potential GFP deadlock\n"); > + I am not against this in principle. It might be too noisy but warn_alloc_failed is throttled already so it should handle too many parallel requests already. I still think that ALLOC_NO_WATERMARKS failures deserve a special treatment because we can tune for that. Care to send a patch? > if (gfp_mask & __GFP_KSWAPD_RECLAIM) > wake_all_kswapds(order, ac); > > > Anyway I would like to make some progress on this patch. Do you think > > that it would be acceptable in the current form without the warning or > > you preffer a different way? > > Oh, I have nothing against your patch, please go ahead with it. I just > wondered out loud when you proposed a warning about deadlocking NOFAIL > allocations but limited it to explicit __GFP_NOFAIL allocations, when > those obviously aren't the only ones that can deadlock in that way. As mentioned above I have added it merely because this would be a new consumer of the reserves which might lead to its depletion without an explicit warning. But the more I think about that the more I am convinced that this should be generalized to ALLOC_NO_WATERMARKS. I will remove the warning from the patch and prepare a separate one which will warn ALLOC_NO_WATERMARKS so that we can discuss that separately. Thanks -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-11-25 9:33 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-11-11 13:48 [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves mhocko 2015-11-11 15:54 ` Johannes Weiner 2015-11-12 8:51 ` Michal Hocko 2015-11-22 12:55 ` Vlastimil Babka 2015-11-23 9:29 ` Michal Hocko 2015-11-23 9:43 ` Vlastimil Babka 2015-11-23 10:13 ` Michal Hocko 2015-11-23 21:26 ` David Rientjes 2015-11-24 9:47 ` Michal Hocko 2015-11-24 16:26 ` Johannes Weiner 2015-11-24 17:02 ` Michal Hocko 2015-11-24 19:57 ` Johannes Weiner 2015-11-25 9:33 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).