* [PATCH 0/2] GFP_NOFAIL reserves + warning about reserves depletion @ 2015-11-25 10:40 Michal Hocko 2015-11-25 10:40 ` [PATCH 1/2] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves Michal Hocko 2015-11-25 10:40 ` [PATCH 2/2] mm: warn about ALLOC_NO_WATERMARKS request failures Michal Hocko 0 siblings, 2 replies; 16+ messages in thread From: Michal Hocko @ 2015-11-25 10:40 UTC (permalink / raw) To: Andrew Morton; +Cc: Mel Gorman, David Rientjes, Johannes Weiner, linux-mm, LKML Hi, The first patch has been posted [1] last time and it seems there is no major opposition to it. The only concern was a warning which was used to note the ALLOC_NO_WATERMARKS request for the __GFP_NOFAIL failed. I still think that the warning is helpful so I've separated it to its own patch 2 and make it more generic to all ALLOC_NO_WATERMARKS failures. The warning is on off but an update to min_free_kbytes allows dump the warning again. [1] http://lkml.kernel.org/r/1447249697-13380-1-git-send-email-mhocko@kernel.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/2] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-25 10:40 [PATCH 0/2] GFP_NOFAIL reserves + warning about reserves depletion Michal Hocko @ 2015-11-25 10:40 ` Michal Hocko 2015-11-25 10:51 ` David Rientjes 2015-12-02 15:13 ` [PATCH v2] " Michal Hocko 2015-11-25 10:40 ` [PATCH 2/2] mm: warn about ALLOC_NO_WATERMARKS request failures Michal Hocko 1 sibling, 2 replies; 16+ messages in thread From: Michal Hocko @ 2015-11-25 10:40 UTC (permalink / raw) To: Andrew Morton Cc: Mel Gorman, David Rientjes, Johannes Weiner, linux-mm, LKML, Michal Hocko From: Michal Hocko <mhocko@suse.com> __GFP_NOFAIL is a big hammer used to ensure that the allocation request can never fail. This is a strong requirement and as such it also deserves a special treatment when the system is OOM. The primary problem here is that the allocation request might have come with some locks held and the oom victim might be blocked on the same locks. This is basically an OOM deadlock situation. This patch tries to reduce the risk of such a deadlocks by giving __GFP_NOFAIL allocations a special treatment and let them dive into memory reserves after oom killer invocation. This should help them to make a progress and release resources they are holding. The OOM victim should compensate for the reserves consumption. Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/page_alloc.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8034909faad2..70db11c27046 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2766,8 +2766,13 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, goto out; } /* Exhausted what can be done so it's blamo time */ - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) + if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { *did_some_progress = 1; + + if (gfp_mask & __GFP_NOFAIL) + page = get_page_from_freelist(gfp_mask, order, + ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac); + } out: mutex_unlock(&oom_lock); return page; -- 2.6.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-25 10:40 ` [PATCH 1/2] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves Michal Hocko @ 2015-11-25 10:51 ` David Rientjes 2015-11-25 11:18 ` Michal Hocko 2015-12-02 15:13 ` [PATCH v2] " Michal Hocko 1 sibling, 1 reply; 16+ messages in thread From: David Rientjes @ 2015-11-25 10:51 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Mel Gorman, Johannes Weiner, linux-mm, LKML, Michal Hocko On Wed, 25 Nov 2015, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > __GFP_NOFAIL is a big hammer used to ensure that the allocation > request can never fail. This is a strong requirement and as such > it also deserves a special treatment when the system is OOM. The > primary problem here is that the allocation request might have > come with some locks held and the oom victim might be blocked > on the same locks. This is basically an OOM deadlock situation. > > This patch tries to reduce the risk of such a deadlocks by giving > __GFP_NOFAIL allocations a special treatment and let them dive into > memory reserves after oom killer invocation. This should help them > to make a progress and release resources they are holding. The OOM > victim should compensate for the reserves consumption. > > Suggested-by: Andrea Arcangeli <aarcange@redhat.com> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/page_alloc.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8034909faad2..70db11c27046 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2766,8 +2766,13 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > goto out; > } > /* Exhausted what can be done so it's blamo time */ > - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) > + if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { > *did_some_progress = 1; > + > + if (gfp_mask & __GFP_NOFAIL) > + page = get_page_from_freelist(gfp_mask, order, > + ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac); > + } > out: > mutex_unlock(&oom_lock); > return page; I don't understand why you're setting ALLOC_CPUSET if you're giving them "special treatment". If you want to allow access to memory reserves to prevent an oom livelock, then why not also allow it access to allocate outside its cpuset? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-25 10:51 ` David Rientjes @ 2015-11-25 11:18 ` Michal Hocko 2015-11-25 20:57 ` David Rientjes 0 siblings, 1 reply; 16+ messages in thread From: Michal Hocko @ 2015-11-25 11:18 UTC (permalink / raw) To: David Rientjes; +Cc: Andrew Morton, Mel Gorman, Johannes Weiner, linux-mm, LKML On Wed 25-11-15 02:51:38, David Rientjes wrote: > On Wed, 25 Nov 2015, Michal Hocko wrote: > > > From: Michal Hocko <mhocko@suse.com> > > > > __GFP_NOFAIL is a big hammer used to ensure that the allocation > > request can never fail. This is a strong requirement and as such > > it also deserves a special treatment when the system is OOM. The > > primary problem here is that the allocation request might have > > come with some locks held and the oom victim might be blocked > > on the same locks. This is basically an OOM deadlock situation. > > > > This patch tries to reduce the risk of such a deadlocks by giving > > __GFP_NOFAIL allocations a special treatment and let them dive into > > memory reserves after oom killer invocation. This should help them > > to make a progress and release resources they are holding. The OOM > > victim should compensate for the reserves consumption. > > > > Suggested-by: Andrea Arcangeli <aarcange@redhat.com> > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > mm/page_alloc.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 8034909faad2..70db11c27046 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2766,8 +2766,13 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > goto out; > > } > > /* Exhausted what can be done so it's blamo time */ > > - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) > > + if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { > > *did_some_progress = 1; > > + > > + if (gfp_mask & __GFP_NOFAIL) > > + page = get_page_from_freelist(gfp_mask, order, > > + ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac); > > + } > > out: > > mutex_unlock(&oom_lock); > > return page; > > I don't understand why you're setting ALLOC_CPUSET if you're giving them > "special treatment". If you want to allow access to memory reserves to > prevent an oom livelock, then why not also allow it access to allocate > outside its cpuset? Good question. My thinking was that __GFP_NOFAIL allocations might be done on behalf on a process so they are not necessarily system wide. We do the same before we actually go to out_of_memory. On the other hand __GFP_NOFAIL should be used really rarely and so breaking the cpuset restriction shouldn't be a big deal if that helps to break out from the potential OOM deadlock. I will drop it. Thanks! --- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-25 11:18 ` Michal Hocko @ 2015-11-25 20:57 ` David Rientjes 2015-11-26 9:34 ` Michal Hocko 0 siblings, 1 reply; 16+ messages in thread From: David Rientjes @ 2015-11-25 20:57 UTC (permalink / raw) To: Michal Hocko; +Cc: Andrew Morton, Mel Gorman, Johannes Weiner, linux-mm, LKML On Wed, 25 Nov 2015, Michal Hocko wrote: > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8034909faad2..94b04c1e894a 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2766,8 +2766,13 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > goto out; > } > /* Exhausted what can be done so it's blamo time */ > - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) > + if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { > *did_some_progress = 1; > + > + if (gfp_mask & __GFP_NOFAIL) > + page = get_page_from_freelist(gfp_mask, order, > + ALLOC_NO_WATERMARKS, ac); > + } > out: > mutex_unlock(&oom_lock); > return page; Well, sure, that's one way to do it, but for cpuset users, wouldn't this lead to a depletion of the first system zone since you've dropped ALLOC_CPUSET and are doing ALLOC_NO_WATERMARKS in the same call? get_page_from_freelist() shouldn't be doing any balancing over the set of allowed zones. Can you justify depleting memory reserves on a zone outside of the set of allowed cpuset mems rather than trying to drop ALLOC_CPUSET first? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-25 20:57 ` David Rientjes @ 2015-11-26 9:34 ` Michal Hocko 2015-11-30 22:17 ` David Rientjes 0 siblings, 1 reply; 16+ messages in thread From: Michal Hocko @ 2015-11-26 9:34 UTC (permalink / raw) To: David Rientjes; +Cc: Andrew Morton, Mel Gorman, Johannes Weiner, linux-mm, LKML On Wed 25-11-15 12:57:08, David Rientjes wrote: > On Wed, 25 Nov 2015, Michal Hocko wrote: > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 8034909faad2..94b04c1e894a 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2766,8 +2766,13 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > goto out; > > } > > /* Exhausted what can be done so it's blamo time */ > > - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) > > + if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { > > *did_some_progress = 1; > > + > > + if (gfp_mask & __GFP_NOFAIL) > > + page = get_page_from_freelist(gfp_mask, order, > > + ALLOC_NO_WATERMARKS, ac); > > + } > > out: > > mutex_unlock(&oom_lock); > > return page; > > Well, sure, that's one way to do it, but for cpuset users, wouldn't this > lead to a depletion of the first system zone since you've dropped > ALLOC_CPUSET and are doing ALLOC_NO_WATERMARKS in the same call? Are you suggesting to do? if (gfp_mask & __GFP_NOFAIL) { page = get_page_from_freelist(gfp_mask, order, ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac); /* * fallback to ignore cpuset if our nodes are * depleted */ if (!page) get_page_from_freelist(gfp_mask, order, ALLOC_NO_WATERMARKS, ac); } I am not really sure this worth complication. __GFP_NOFAIL should be relatively rare and nodes are rarely depeleted so much that ALLOC_NO_WATERMARKS wouldn't be able to allocate from the first zone in the zone list. I mean I have no problem to do the above it just sounds overcomplicating the situation without making practical difference. If you and others insist I can resping the patch though. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-26 9:34 ` Michal Hocko @ 2015-11-30 22:17 ` David Rientjes 2015-12-02 15:07 ` Michal Hocko 0 siblings, 1 reply; 16+ messages in thread From: David Rientjes @ 2015-11-30 22:17 UTC (permalink / raw) To: Michal Hocko; +Cc: Andrew Morton, Mel Gorman, Johannes Weiner, linux-mm, LKML On Thu, 26 Nov 2015, Michal Hocko wrote: > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index 8034909faad2..94b04c1e894a 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -2766,8 +2766,13 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > > goto out; > > > } > > > /* Exhausted what can be done so it's blamo time */ > > > - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) > > > + if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { > > > *did_some_progress = 1; > > > + > > > + if (gfp_mask & __GFP_NOFAIL) > > > + page = get_page_from_freelist(gfp_mask, order, > > > + ALLOC_NO_WATERMARKS, ac); > > > + } > > > out: > > > mutex_unlock(&oom_lock); > > > return page; > > > > Well, sure, that's one way to do it, but for cpuset users, wouldn't this > > lead to a depletion of the first system zone since you've dropped > > ALLOC_CPUSET and are doing ALLOC_NO_WATERMARKS in the same call? > > Are you suggesting to do? > if (gfp_mask & __GFP_NOFAIL) { > page = get_page_from_freelist(gfp_mask, order, > ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac); > /* > * fallback to ignore cpuset if our nodes are > * depleted > */ > if (!page) > get_page_from_freelist(gfp_mask, order, > ALLOC_NO_WATERMARKS, ac); > } > > I am not really sure this worth complication. I'm objecting to the ability of a process that is doing a __GFP_NOFAIL allocation, which has been disallowed access from allocating on certain mems through cpusets, to cause an oom condition on those disallowed nodes, yes. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-30 22:17 ` David Rientjes @ 2015-12-02 15:07 ` Michal Hocko 0 siblings, 0 replies; 16+ messages in thread From: Michal Hocko @ 2015-12-02 15:07 UTC (permalink / raw) To: David Rientjes; +Cc: Andrew Morton, Mel Gorman, Johannes Weiner, linux-mm, LKML On Mon 30-11-15 14:17:03, David Rientjes wrote: > On Thu, 26 Nov 2015, Michal Hocko wrote: > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > index 8034909faad2..94b04c1e894a 100644 > > > > --- a/mm/page_alloc.c > > > > +++ b/mm/page_alloc.c > > > > @@ -2766,8 +2766,13 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > > > goto out; > > > > } > > > > /* Exhausted what can be done so it's blamo time */ > > > > - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) > > > > + if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { > > > > *did_some_progress = 1; > > > > + > > > > + if (gfp_mask & __GFP_NOFAIL) > > > > + page = get_page_from_freelist(gfp_mask, order, > > > > + ALLOC_NO_WATERMARKS, ac); > > > > + } > > > > out: > > > > mutex_unlock(&oom_lock); > > > > return page; > > > > > > Well, sure, that's one way to do it, but for cpuset users, wouldn't this > > > lead to a depletion of the first system zone since you've dropped > > > ALLOC_CPUSET and are doing ALLOC_NO_WATERMARKS in the same call? > > > > Are you suggesting to do? > > if (gfp_mask & __GFP_NOFAIL) { > > page = get_page_from_freelist(gfp_mask, order, > > ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac); > > /* > > * fallback to ignore cpuset if our nodes are > > * depleted > > */ > > if (!page) > > get_page_from_freelist(gfp_mask, order, > > ALLOC_NO_WATERMARKS, ac); > > } > > > > I am not really sure this worth complication. > > I'm objecting to the ability of a process that is doing a __GFP_NOFAIL > allocation, which has been disallowed access from allocating on certain > mems through cpusets, to cause an oom condition on those disallowed nodes, > yes. That ability will be there even with the fallback mechanism. My primary objections was that the fallback is unnecessarily complex without any evidence that such a situation would happen in the real life often enought to bother about it. __GFP_NOFAIL allocations are and should be rare and any runaway triggerable from the userspace is a kernel bug. Anyway, as you seem to feel really strongly about this I will post v2 with the above fallback. This is a superslow path anyway... -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-11-25 10:40 ` [PATCH 1/2] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves Michal Hocko 2015-11-25 10:51 ` David Rientjes @ 2015-12-02 15:13 ` Michal Hocko 2015-12-03 0:01 ` David Rientjes 1 sibling, 1 reply; 16+ messages in thread From: Michal Hocko @ 2015-12-02 15:13 UTC (permalink / raw) To: Andrew Morton Cc: David Rientjes, Johannes Weiner, Mel Gorman, linux-mm, LKML, Michal Hocko From: Michal Hocko <mhocko@suse.com> __GFP_NOFAIL is a big hammer used to ensure that the allocation request can never fail. This is a strong requirement and as such it also deserves a special treatment when the system is OOM. The primary problem here is that the allocation request might have come with some locks held and the oom victim might be blocked on the same locks. This is basically an OOM deadlock situation. This patch tries to reduce the risk of such a deadlocks by giving __GFP_NOFAIL allocations a special treatment and let them dive into memory reserves after oom killer invocation. This should help them to make a progress and release resources they are holding. The OOM victim should compensate for the reserves consumption. Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/page_alloc.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8034909faad2..367523b2948b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2766,8 +2766,21 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, goto out; } /* Exhausted what can be done so it's blamo time */ - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) + if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { *did_some_progress = 1; + + if (gfp_mask & __GFP_NOFAIL) { + page = get_page_from_freelist(gfp_mask, order, + ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac); + /* + * fallback to ignore cpuset restriction if our nodes + * are depleted + */ + if (!page) + page = get_page_from_freelist(gfp_mask, order, + ALLOC_NO_WATERMARKS, ac); + } + } out: mutex_unlock(&oom_lock); return page; -- 2.6.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves 2015-12-02 15:13 ` [PATCH v2] " Michal Hocko @ 2015-12-03 0:01 ` David Rientjes 0 siblings, 0 replies; 16+ messages in thread From: David Rientjes @ 2015-12-03 0:01 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Johannes Weiner, Mel Gorman, linux-mm, LKML, Michal Hocko On Wed, 2 Dec 2015, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > __GFP_NOFAIL is a big hammer used to ensure that the allocation > request can never fail. This is a strong requirement and as such > it also deserves a special treatment when the system is OOM. The > primary problem here is that the allocation request might have > come with some locks held and the oom victim might be blocked > on the same locks. This is basically an OOM deadlock situation. > > This patch tries to reduce the risk of such a deadlocks by giving > __GFP_NOFAIL allocations a special treatment and let them dive into > memory reserves after oom killer invocation. This should help them > to make a progress and release resources they are holding. The OOM > victim should compensate for the reserves consumption. > > Suggested-by: Andrea Arcangeli <aarcange@redhat.com> > Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: David Rientjes <rientjes@google.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/2] mm: warn about ALLOC_NO_WATERMARKS request failures 2015-11-25 10:40 [PATCH 0/2] GFP_NOFAIL reserves + warning about reserves depletion Michal Hocko 2015-11-25 10:40 ` [PATCH 1/2] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves Michal Hocko @ 2015-11-25 10:40 ` Michal Hocko 2015-11-25 10:59 ` David Rientjes 1 sibling, 1 reply; 16+ messages in thread From: Michal Hocko @ 2015-11-25 10:40 UTC (permalink / raw) To: Andrew Morton Cc: Mel Gorman, David Rientjes, Johannes Weiner, linux-mm, LKML, Michal Hocko From: Michal Hocko <mhocko@suse.com> ALLOC_NO_WATERMARKS requests can dive into memory reserves without any restriction. They are used only in the case of emergency to allow forward memory reclaim progress assuming the caller should return the memory in a short time (e.g. {__GFP,PF}_MEMALLOC requests or OOM victim on the way to exit or __GFP_NOFAIL requests hitting OOM). There is no guarantee such request succeed because memory reserves might get depleted as well. This might be either a result of a bug where memory reserves are abused or a result of a too optimistic configuration of memory reserves. This patch makes sure that the administrator gets a warning when these requests fail with a hint that min_free_kbytes might be used to increase the amount of memory reserves. The warning might also help us check whether the issue is caused by a buggy user or the configuration. To prevent from flooding the logs the warning is on off but we allow it to trigger again after min_free_kbytes was updated. Something really bad is clearly going on if the warning hits even after multiple updates of min_free_kbytes. Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/page_alloc.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 70db11c27046..6a05d771cb08 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -240,6 +240,8 @@ compound_page_dtor * const compound_page_dtors[] = { #endif }; +/* warn about depleted watermarks */ +static bool warn_alloc_no_wmarks; int min_free_kbytes = 1024; int user_min_free_kbytes = -1; @@ -2642,6 +2644,13 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, if (zonelist_rescan) goto zonelist_scan; + /* WARN only once unless min_free_kbytes is updated */ + if (warn_alloc_no_wmarks && (alloc_flags & ALLOC_NO_WATERMARKS)) { + warn_alloc_no_wmarks = 0; + WARN(1, "Memory reserves are depleted for order:%d, mode:0x%x." + " You might consider increasing min_free_kbytes\n", + order, gfp_mask); + } return NULL; } @@ -6048,6 +6057,9 @@ static void __setup_per_zone_wmarks(void) struct zone *zone; unsigned long flags; + /* Warn when ALLOC_NO_WATERMARKS request fails */ + warn_alloc_no_wmarks = 1; + /* Calculate total number of !ZONE_HIGHMEM pages */ for_each_zone(zone) { if (!is_highmem(zone)) -- 2.6.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] mm: warn about ALLOC_NO_WATERMARKS request failures 2015-11-25 10:40 ` [PATCH 2/2] mm: warn about ALLOC_NO_WATERMARKS request failures Michal Hocko @ 2015-11-25 10:59 ` David Rientjes 2015-11-25 11:55 ` Michal Hocko 0 siblings, 1 reply; 16+ messages in thread From: David Rientjes @ 2015-11-25 10:59 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Mel Gorman, Johannes Weiner, linux-mm, LKML, Michal Hocko On Wed, 25 Nov 2015, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > ALLOC_NO_WATERMARKS requests can dive into memory reserves without any > restriction. They are used only in the case of emergency to allow > forward memory reclaim progress assuming the caller should return the > memory in a short time (e.g. {__GFP,PF}_MEMALLOC requests or OOM victim > on the way to exit or __GFP_NOFAIL requests hitting OOM). There is no > guarantee such request succeed because memory reserves might get > depleted as well. This might be either a result of a bug where memory > reserves are abused or a result of a too optimistic configuration of > memory reserves. > > This patch makes sure that the administrator gets a warning when these > requests fail with a hint that min_free_kbytes might be used to increase > the amount of memory reserves. The warning might also help us check > whether the issue is caused by a buggy user or the configuration. To > prevent from flooding the logs the warning is on off but we allow it to > trigger again after min_free_kbytes was updated. Something really bad is > clearly going on if the warning hits even after multiple updates of > min_free_kbytes. > > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/page_alloc.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 70db11c27046..6a05d771cb08 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -240,6 +240,8 @@ compound_page_dtor * const compound_page_dtors[] = { > #endif > }; > > +/* warn about depleted watermarks */ > +static bool warn_alloc_no_wmarks; > int min_free_kbytes = 1024; > int user_min_free_kbytes = -1; > > @@ -2642,6 +2644,13 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, > if (zonelist_rescan) > goto zonelist_scan; > > + /* WARN only once unless min_free_kbytes is updated */ > + if (warn_alloc_no_wmarks && (alloc_flags & ALLOC_NO_WATERMARKS)) { > + warn_alloc_no_wmarks = 0; > + WARN(1, "Memory reserves are depleted for order:%d, mode:0x%x." > + " You might consider increasing min_free_kbytes\n", > + order, gfp_mask); > + } > return NULL; > } > Doesn't this warn for high-order allocations prior to the first call to direct compaction whereas min_free_kbytes may be irrelevant? Providing the order is good, but there's no indication when min_free_kbytes may be helpful from this warning. WARN() isn't even going to show the state of memory. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] mm: warn about ALLOC_NO_WATERMARKS request failures 2015-11-25 10:59 ` David Rientjes @ 2015-11-25 11:55 ` Michal Hocko 2015-11-25 21:01 ` David Rientjes 0 siblings, 1 reply; 16+ messages in thread From: Michal Hocko @ 2015-11-25 11:55 UTC (permalink / raw) To: David Rientjes; +Cc: Andrew Morton, Mel Gorman, Johannes Weiner, linux-mm, LKML On Wed 25-11-15 02:59:19, David Rientjes wrote: > On Wed, 25 Nov 2015, Michal Hocko wrote: [...] > > @@ -2642,6 +2644,13 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, > > if (zonelist_rescan) > > goto zonelist_scan; > > > > + /* WARN only once unless min_free_kbytes is updated */ > > + if (warn_alloc_no_wmarks && (alloc_flags & ALLOC_NO_WATERMARKS)) { > > + warn_alloc_no_wmarks = 0; > > + WARN(1, "Memory reserves are depleted for order:%d, mode:0x%x." > > + " You might consider increasing min_free_kbytes\n", > > + order, gfp_mask); > > + } > > return NULL; > > } > > > > Doesn't this warn for high-order allocations prior to the first call to > direct compaction whereas min_free_kbytes may be irrelevant? Hmm, you are concerned about high order ALLOC_NO_WATERMARKS allocation which happen prior to compaction, right? I am wondering whether there are reasonable chances that a compaction would make a difference if we are so depleted that there is no single page with >= order. ALLOC_NO_WATERMARKS with high order allocations should be rare if existing at all. > Providing > the order is good, but there's no indication when min_free_kbytes may be > helpful from this warning. I am not sure I understand what you mean here. > WARN() isn't even going to show the state of memory. I was considering to do that but it would make the code unnecessarily more complex. If the allocation is allowed to fail it would dump the allocation failure. The purpose of the message is to tell us that reserves are not sufficient. I am not sure seeing the memory state dump would help us much more. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] mm: warn about ALLOC_NO_WATERMARKS request failures 2015-11-25 11:55 ` Michal Hocko @ 2015-11-25 21:01 ` David Rientjes 2015-11-26 9:52 ` Michal Hocko 0 siblings, 1 reply; 16+ messages in thread From: David Rientjes @ 2015-11-25 21:01 UTC (permalink / raw) To: Michal Hocko; +Cc: Andrew Morton, Mel Gorman, Johannes Weiner, linux-mm, LKML On Wed, 25 Nov 2015, Michal Hocko wrote: > > > @@ -2642,6 +2644,13 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, > > > if (zonelist_rescan) > > > goto zonelist_scan; > > > > > > + /* WARN only once unless min_free_kbytes is updated */ > > > + if (warn_alloc_no_wmarks && (alloc_flags & ALLOC_NO_WATERMARKS)) { > > > + warn_alloc_no_wmarks = 0; > > > + WARN(1, "Memory reserves are depleted for order:%d, mode:0x%x." > > > + " You might consider increasing min_free_kbytes\n", > > > + order, gfp_mask); > > > + } > > > return NULL; > > > } > > > > > > > Doesn't this warn for high-order allocations prior to the first call to > > direct compaction whereas min_free_kbytes may be irrelevant? > > Hmm, you are concerned about high order ALLOC_NO_WATERMARKS allocation > which happen prior to compaction, right? I am wondering whether there > are reasonable chances that a compaction would make a difference if we > are so depleted that there is no single page with >= order. > ALLOC_NO_WATERMARKS with high order allocations should be rare if > existing at all. > No, I'm concerned about get_page_from_freelist() failing for an order-9 allocation due to _fragmentation_ and then emitting this warning although free watermarks may be gigabytes of memory higher than min watermarks. > > Providing > > the order is good, but there's no indication when min_free_kbytes may be > > helpful from this warning. > > I am not sure I understand what you mean here. > You show the order of the failed allocation in your new warning. Good. It won't help to raise min_free_kbytes to infinity if the high-order allocation failed due to fragmentation. Does that make sense? > > WARN() isn't even going to show the state of memory. > > I was considering to do that but it would make the code unnecessarily > more complex. If the allocation is allowed to fail it would dump the > allocation failure. The purpose of the message is to tell us that > reserves are not sufficient. I am not sure seeing the memory state dump > would help us much more. > If the purpsoe of the message is to tell us when reserves are insufficient, it doesn't achieve that purpose if allocations fail due to fragmentation or lowmem_reserve_ratio. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] mm: warn about ALLOC_NO_WATERMARKS request failures 2015-11-25 21:01 ` David Rientjes @ 2015-11-26 9:52 ` Michal Hocko 2015-11-30 22:24 ` David Rientjes 0 siblings, 1 reply; 16+ messages in thread From: Michal Hocko @ 2015-11-26 9:52 UTC (permalink / raw) To: David Rientjes; +Cc: Andrew Morton, Mel Gorman, Johannes Weiner, linux-mm, LKML On Wed 25-11-15 13:01:56, David Rientjes wrote: > On Wed, 25 Nov 2015, Michal Hocko wrote: > > > > > @@ -2642,6 +2644,13 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, > > > > if (zonelist_rescan) > > > > goto zonelist_scan; > > > > > > > > + /* WARN only once unless min_free_kbytes is updated */ > > > > + if (warn_alloc_no_wmarks && (alloc_flags & ALLOC_NO_WATERMARKS)) { > > > > + warn_alloc_no_wmarks = 0; > > > > + WARN(1, "Memory reserves are depleted for order:%d, mode:0x%x." > > > > + " You might consider increasing min_free_kbytes\n", > > > > + order, gfp_mask); > > > > + } > > > > return NULL; > > > > } > > > > > > > > > > Doesn't this warn for high-order allocations prior to the first call to > > > direct compaction whereas min_free_kbytes may be irrelevant? > > > > Hmm, you are concerned about high order ALLOC_NO_WATERMARKS allocation > > which happen prior to compaction, right? I am wondering whether there > > are reasonable chances that a compaction would make a difference if we > > are so depleted that there is no single page with >= order. > > ALLOC_NO_WATERMARKS with high order allocations should be rare if > > existing at all. > > > > No, I'm concerned about get_page_from_freelist() failing for an order-9 > allocation due to _fragmentation_ and then emitting this warning although > free watermarks may be gigabytes of memory higher than min watermarks. Hmm, should we allow ALLOC_NO_WATERMARKS for order-9 (or > PAGE_ALLOC_COSTLY_ORDER for that matter) allocations though? What would be the point if they are allowed to fail and so they cannot be relied on inherently? I can see that we might do that currently - e.g. TIF_MEMDIE might be set while doing hugetlb page allocation but I seriously doubt that this is intentional and probably worth fixing. > > > Providing > > > the order is good, but there's no indication when min_free_kbytes may be > > > helpful from this warning. > > > > I am not sure I understand what you mean here. > > > > You show the order of the failed allocation in your new warning. Good. > It won't help to raise min_free_kbytes to infinity if the high-order > allocation failed due to fragmentation. Does that make sense? Sure this makes sense but as I've tried to argue the warning is just a hint. It should warn that something unexpected is happening and offer a workaround. And yes increasing min_free_kbytes helps to keep more high order pages availble from my experience. If the workaround doesn't help I suspect the bug report would come more promptly. Your example about order-9 ALLOC_NO_WATERMARKS failure is more than exaggarated IMHO. > > > WARN() isn't even going to show the state of memory. > > > > I was considering to do that but it would make the code unnecessarily > > more complex. If the allocation is allowed to fail it would dump the > > allocation failure. The purpose of the message is to tell us that > > reserves are not sufficient. I am not sure seeing the memory state dump > > would help us much more. > > > > If the purpsoe of the message is to tell us when reserves are > insufficient, it doesn't achieve that purpose if allocations fail due to > fragmentation or lowmem_reserve_ratio. Do you have any better suggestion or you just think that warning about depleted reserves doesn't make any sense at all? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] mm: warn about ALLOC_NO_WATERMARKS request failures 2015-11-26 9:52 ` Michal Hocko @ 2015-11-30 22:24 ` David Rientjes 0 siblings, 0 replies; 16+ messages in thread From: David Rientjes @ 2015-11-30 22:24 UTC (permalink / raw) To: Michal Hocko; +Cc: Andrew Morton, Mel Gorman, Johannes Weiner, linux-mm, LKML On Thu, 26 Nov 2015, Michal Hocko wrote: > > > > > @@ -2642,6 +2644,13 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, > > > > > if (zonelist_rescan) > > > > > goto zonelist_scan; > > > > > > > > > > + /* WARN only once unless min_free_kbytes is updated */ > > > > > + if (warn_alloc_no_wmarks && (alloc_flags & ALLOC_NO_WATERMARKS)) { > > > > > + warn_alloc_no_wmarks = 0; > > > > > + WARN(1, "Memory reserves are depleted for order:%d, mode:0x%x." > > > > > + " You might consider increasing min_free_kbytes\n", > > > > > + order, gfp_mask); > > > > > + } > > > > > return NULL; > > > > > } > > > > > > > > > > > > > Doesn't this warn for high-order allocations prior to the first call to > > > > direct compaction whereas min_free_kbytes may be irrelevant? > > > > > > Hmm, you are concerned about high order ALLOC_NO_WATERMARKS allocation > > > which happen prior to compaction, right? I am wondering whether there > > > are reasonable chances that a compaction would make a difference if we > > > are so depleted that there is no single page with >= order. > > > ALLOC_NO_WATERMARKS with high order allocations should be rare if > > > existing at all. > > > > > > > No, I'm concerned about get_page_from_freelist() failing for an order-9 > > allocation due to _fragmentation_ and then emitting this warning although > > free watermarks may be gigabytes of memory higher than min watermarks. > > Hmm, should we allow ALLOC_NO_WATERMARKS for order-9 (or > > PAGE_ALLOC_COSTLY_ORDER for that matter) allocations though? What would > be the point if they are allowed to fail and so they cannot be relied on > inherently? This patch isn't addressing what orders the page allocator allows access to memory reserves for, I'm not sure this has anything to do with the warning you propose to add. My concern is that this will start doing Memory reserves are depleted for order:9. You might consider increasing min_free_kbytes in the kernel log with a long stack trace that is going to grab attention and then some user will actually follow the advice and see that the warning persists because the failure was due to fragmentation rather than watermarks. It would be much better if the warning were only emitted when the _watermark_, not fragmentation, was the source of the failure. That is very easy to do, by calling __zone_watermark_ok() for order 0. I would also suggest that this is done in the same way that GFP_ATOMIC allocations fail that have depleted ALLOC_HARD and ALLOC_HARDER memory reserves, with something resembling a page allocation failure warning that actually presents useful data. Your patch is already insufficient because it doesn't handle __GFP_NOWARN. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-12-03 0:01 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-11-25 10:40 [PATCH 0/2] GFP_NOFAIL reserves + warning about reserves depletion Michal Hocko 2015-11-25 10:40 ` [PATCH 1/2] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves Michal Hocko 2015-11-25 10:51 ` David Rientjes 2015-11-25 11:18 ` Michal Hocko 2015-11-25 20:57 ` David Rientjes 2015-11-26 9:34 ` Michal Hocko 2015-11-30 22:17 ` David Rientjes 2015-12-02 15:07 ` Michal Hocko 2015-12-02 15:13 ` [PATCH v2] " Michal Hocko 2015-12-03 0:01 ` David Rientjes 2015-11-25 10:40 ` [PATCH 2/2] mm: warn about ALLOC_NO_WATERMARKS request failures Michal Hocko 2015-11-25 10:59 ` David Rientjes 2015-11-25 11:55 ` Michal Hocko 2015-11-25 21:01 ` David Rientjes 2015-11-26 9:52 ` Michal Hocko 2015-11-30 22:24 ` David Rientjes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).