[PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
@ 2015-11-11 13:48 mhocko
  2015-11-11 15:54 ` Johannes Weiner
  2015-11-22 12:55 ` Vlastimil Babka
  0 siblings, 2 replies; 13+ messages in thread
From: mhocko @ 2015-11-11 13:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Andrea Arcangeli, Mel Gorman, David Rientjes,
	linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

__GFP_NOFAIL is a big hammer used to ensure that the allocation
request can never fail. This is a strong requirement and as such
it also deserves a special treatment when the system is OOM. The
primary problem here is that the allocation request might have
come with some locks held and the oom victim might be blocked
on the same locks. This is basically an OOM deadlock situation.

This patch tries to reduce the risk of such a deadlocks by giving
__GFP_NOFAIL allocations a special treatment and let them dive into
memory reserves after oom killer invocation. This should help them
to make a progress and release resources they are holding. The OOM
victim should compensate for the reserves consumption.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
this has been posted previously as a part of larger GFP_NOFS related
patch set (http://lkml.kernel.org/r/1438768284-30927-1-git-send-email-mhocko%40kernel.org)
but Andrea was asking basically the same thing at LSF early this year
(I cannot seem to find it in any public archive though). I think the
patch makes some sense on its own.

Comments?

 mm/page_alloc.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8034909faad2..d30bce9d7ac8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2766,8 +2766,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 			goto out;
 	}
 	/* Exhausted what can be done so it's blamo time */
-	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
+	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
 		*did_some_progress = 1;
+
+		if (gfp_mask & __GFP_NOFAIL) {
+			page = get_page_from_freelist(gfp_mask, order,
+					ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac);
+			WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation."
+				    " Consider increasing min_free_kbytes.\n");
+		}
+	}
 out:
 	mutex_unlock(&oom_lock);
 	return page;
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
  2015-11-11 13:48 [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves mhocko
@ 2015-11-11 15:54 ` Johannes Weiner
  2015-11-12  8:51   ` Michal Hocko
  2015-11-22 12:55 ` Vlastimil Babka
  1 sibling, 1 reply; 13+ messages in thread
From: Johannes Weiner @ 2015-11-11 15:54 UTC (permalink / raw)
  To: mhocko
  Cc: Andrew Morton, Andrea Arcangeli, Mel Gorman, David Rientjes,
	linux-mm, LKML, Michal Hocko

On Wed, Nov 11, 2015 at 02:48:17PM +0100, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> __GFP_NOFAIL is a big hammer used to ensure that the allocation
> request can never fail. This is a strong requirement and as such
> it also deserves a special treatment when the system is OOM. The
> primary problem here is that the allocation request might have
> come with some locks held and the oom victim might be blocked
> on the same locks. This is basically an OOM deadlock situation.
> 
> This patch tries to reduce the risk of such a deadlocks by giving
> __GFP_NOFAIL allocations a special treatment and let them dive into
> memory reserves after oom killer invocation. This should help them
> to make a progress and release resources they are holding. The OOM
> victim should compensate for the reserves consumption.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> 
> Hi,
> this has been posted previously as a part of larger GFP_NOFS related
> patch set (http://lkml.kernel.org/r/1438768284-30927-1-git-send-email-mhocko%40kernel.org)
> but Andrea was asking basically the same thing at LSF early this year
> (I cannot seem to find it in any public archive though). I think the
> patch makes some sense on its own.

I sent this right after LSF based on Andrea's suggestion:

https://lkml.org/lkml/2015/3/25/37

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
  2015-11-11 15:54 ` Johannes Weiner
@ 2015-11-12  8:51   ` Michal Hocko
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2015-11-12  8:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Andrea Arcangeli, Mel Gorman, David Rientjes,
	linux-mm, LKML

On Wed 11-11-15 10:54:46, Johannes Weiner wrote:
> On Wed, Nov 11, 2015 at 02:48:17PM +0100, mhocko@kernel.org wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > __GFP_NOFAIL is a big hammer used to ensure that the allocation
> > request can never fail. This is a strong requirement and as such
> > it also deserves a special treatment when the system is OOM. The
> > primary problem here is that the allocation request might have
> > come with some locks held and the oom victim might be blocked
> > on the same locks. This is basically an OOM deadlock situation.
> > 
> > This patch tries to reduce the risk of such a deadlocks by giving
> > __GFP_NOFAIL allocations a special treatment and let them dive into
> > memory reserves after oom killer invocation. This should help them
> > to make a progress and release resources they are holding. The OOM
> > victim should compensate for the reserves consumption.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> > 
> > Hi,
> > this has been posted previously as a part of larger GFP_NOFS related
> > patch set (http://lkml.kernel.org/r/1438768284-30927-1-git-send-email-mhocko%40kernel.org)
> > but Andrea was asking basically the same thing at LSF early this year
> > (I cannot seem to find it in any public archive though). I think the
> > patch makes some sense on its own.
> 
> I sent this right after LSF based on Andrea's suggestion:
> 
> https://lkml.org/lkml/2015/3/25/37

Ohh, I completely forgot as it was part of a larger series.
Thanks for the pointer.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
  2015-11-11 13:48 [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves mhocko
  2015-11-11 15:54 ` Johannes Weiner
@ 2015-11-22 12:55 ` Vlastimil Babka
  2015-11-23  9:29   ` Michal Hocko
  1 sibling, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2015-11-22 12:55 UTC (permalink / raw)
  To: mhocko, Andrew Morton
  Cc: Johannes Weiner, Andrea Arcangeli, Mel Gorman, David Rientjes,
	linux-mm, LKML, Michal Hocko

On 11.11.2015 14:48, mhocko@kernel.org wrote:
>  mm/page_alloc.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8034909faad2..d30bce9d7ac8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2766,8 +2766,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  			goto out;
>  	}
>  	/* Exhausted what can be done so it's blamo time */
> -	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
> +	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
>  		*did_some_progress = 1;
> +
> +		if (gfp_mask & __GFP_NOFAIL) {
> +			page = get_page_from_freelist(gfp_mask, order,
> +					ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac);
> +			WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation."
> +				    " Consider increasing min_free_kbytes.\n");

It seems redundant to me to keep the WARN_ON_ONCE also above in the if () part?
Also s/gfp_nofail/GFP_NOFAIL/ for consistency?

Hm and probably out of scope of your patch, but I understand the WARN_ONCE
(WARN_ON_ONCE) to be _ONCE just to prevent a flood from a single task looping
here. But for distinct tasks and potentially far away in time, wouldn't we want
to see all the warnings? Would that be feasible to implement?

> +		}
> +	}
>  out:
>  	mutex_unlock(&oom_lock);
>  	return page;
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
  2015-11-22 12:55 ` Vlastimil Babka
@ 2015-11-23  9:29   ` Michal Hocko
  2015-11-23  9:43     ` Vlastimil Babka
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2015-11-23  9:29 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Johannes Weiner, Andrea Arcangeli, Mel Gorman,
	David Rientjes, linux-mm, LKML

On Sun 22-11-15 13:55:31, Vlastimil Babka wrote:
> On 11.11.2015 14:48, mhocko@kernel.org wrote:
> >  mm/page_alloc.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 8034909faad2..d30bce9d7ac8 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2766,8 +2766,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> >  			goto out;
> >  	}
> >  	/* Exhausted what can be done so it's blamo time */
> > -	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
> > +	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> >  		*did_some_progress = 1;
> > +
> > +		if (gfp_mask & __GFP_NOFAIL) {
> > +			page = get_page_from_freelist(gfp_mask, order,
> > +					ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac);
> > +			WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation."
> > +				    " Consider increasing min_free_kbytes.\n");
> 
> It seems redundant to me to keep the WARN_ON_ONCE also above in the if () part?

They are warning about two different things. The first one catches a
buggy code which uses __GFP_NOFAIL from oom disabled context while the
second one tries to help the administrator with a hint that memory
reserves are too small.

> Also s/gfp_nofail/GFP_NOFAIL/ for consistency?

Fair enough, changed.

> Hm and probably out of scope of your patch, but I understand the WARN_ONCE
> (WARN_ON_ONCE) to be _ONCE just to prevent a flood from a single task looping
> here. But for distinct tasks and potentially far away in time, wouldn't we want
> to see all the warnings? Would that be feasible to implement?

I was thinking about that as well some time ago but it was quite
hard to find a good enough API to tell when to warn again. The first
WARN_ON_ONCE should trigger for all different _code paths_ no matter
how frequently they appear to catch all the buggy callers. The second
one would benefit from a new warning after min_free_kbytes was updated
because it would tell the administrator that the last update was not
sufficient for the workload.

> 
> > +		}
> > +	}
> >  out:
> >  	mutex_unlock(&oom_lock);
> >  	return page;
> > 

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
  2015-11-23  9:29   ` Michal Hocko
@ 2015-11-23  9:43     ` Vlastimil Babka
  2015-11-23 10:13       ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2015-11-23  9:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Johannes Weiner, Andrea Arcangeli, Mel Gorman,
	David Rientjes, linux-mm, LKML

On 11/23/2015 10:29 AM, Michal Hocko wrote:
> On Sun 22-11-15 13:55:31, Vlastimil Babka wrote:
>> On 11.11.2015 14:48, mhocko@kernel.org wrote:
>>>   mm/page_alloc.c | 10 +++++++++-
>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 8034909faad2..d30bce9d7ac8 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -2766,8 +2766,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>>>   			goto out;
>>>   	}
>>>   	/* Exhausted what can be done so it's blamo time */
>>> -	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
>>> +	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
>>>   		*did_some_progress = 1;
>>> +
>>> +		if (gfp_mask & __GFP_NOFAIL) {
>>> +			page = get_page_from_freelist(gfp_mask, order,
>>> +					ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac);
>>> +			WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation."
>>> +				    " Consider increasing min_free_kbytes.\n");
>>
>> It seems redundant to me to keep the WARN_ON_ONCE also above in the if () part?
>
> They are warning about two different things. The first one catches a
> buggy code which uses __GFP_NOFAIL from oom disabled context while the

Ah, I see, I misinterpreted what the return values of out_of_memory() 
mean. But now that I look at its code, it seems to only return false 
when oom_killer_disabled is set to true. Which is a global thing and 
nothing to do with the context of the __GFP_NOFAIL allocation?

> second one tries to help the administrator with a hint that memory
> reserves are too small.
>
>> Also s/gfp_nofail/GFP_NOFAIL/ for consistency?
>
> Fair enough, changed.
>
>> Hm and probably out of scope of your patch, but I understand the WARN_ONCE
>> (WARN_ON_ONCE) to be _ONCE just to prevent a flood from a single task looping
>> here. But for distinct tasks and potentially far away in time, wouldn't we want
>> to see all the warnings? Would that be feasible to implement?
>
> I was thinking about that as well some time ago but it was quite
> hard to find a good enough API to tell when to warn again. The first
> WARN_ON_ONCE should trigger for all different _code paths_ no matter
> how frequently they appear to catch all the buggy callers. The second
> one would benefit from a new warning after min_free_kbytes was updated
> because it would tell the administrator that the last update was not
> sufficient for the workload.

Hm, what about adding a flag to the struct alloc_context, so that when 
the particular allocation attempt emits the warning, it sets a flag in 
the alloc_context so that it won't emit them again as long as it keeps 
looping and attempting oom. Other allocations will warn independently.

We could also print the same info as the "allocation failed" warnings 
do, since it's very similar, except we can't fail - but the admin/bug 
reporter should be interested in the same details as for an allocation 
failure that is allowed to fail. But it's also true that we have 
probably just printed the info during out_of_memory()... except when we 
skipped that for some reason?

>>
>>> +		}
>>> +	}
>>>   out:
>>>   	mutex_unlock(&oom_lock);
>>>   	return page;
>>>
>
> Thanks!
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
  2015-11-23  9:43     ` Vlastimil Babka
@ 2015-11-23 10:13       ` Michal Hocko
  2015-11-23 21:26         ` David Rientjes
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2015-11-23 10:13 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Johannes Weiner, Andrea Arcangeli, Mel Gorman,
	David Rientjes, linux-mm, LKML

On Mon 23-11-15 10:43:42, Vlastimil Babka wrote:
> On 11/23/2015 10:29 AM, Michal Hocko wrote:
> >On Sun 22-11-15 13:55:31, Vlastimil Babka wrote:
> >>On 11.11.2015 14:48, mhocko@kernel.org wrote:
> >>>  mm/page_alloc.c | 10 +++++++++-
> >>>  1 file changed, 9 insertions(+), 1 deletion(-)
> >>>
> >>>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>>index 8034909faad2..d30bce9d7ac8 100644
> >>>--- a/mm/page_alloc.c
> >>>+++ b/mm/page_alloc.c
> >>>@@ -2766,8 +2766,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> >>>  			goto out;
> >>>  	}
> >>>  	/* Exhausted what can be done so it's blamo time */
> >>>-	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
> >>>+	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> >>>  		*did_some_progress = 1;
> >>>+
> >>>+		if (gfp_mask & __GFP_NOFAIL) {
> >>>+			page = get_page_from_freelist(gfp_mask, order,
> >>>+					ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac);
> >>>+			WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation."
> >>>+				    " Consider increasing min_free_kbytes.\n");
> >>
> >>It seems redundant to me to keep the WARN_ON_ONCE also above in the if () part?
> >
> >They are warning about two different things. The first one catches a
> >buggy code which uses __GFP_NOFAIL from oom disabled context while the
> 
> Ah, I see, I misinterpreted what the return values of out_of_memory() mean.
> But now that I look at its code, it seems to only return false when
> oom_killer_disabled is set to true. Which is a global thing and nothing to
> do with the context of the __GFP_NOFAIL allocation?

I am not sure I follow you here. The point of the warning is to warn
when the oom killer is disbaled (out_of_memory returns false) _and_ the
request is __GFP_NOFAIL because we simply cannot guarantee any forward
progress and just a use of the allocation flag is not supproted.

[...]
> >>Hm and probably out of scope of your patch, but I understand the WARN_ONCE
> >>(WARN_ON_ONCE) to be _ONCE just to prevent a flood from a single task looping
> >>here. But for distinct tasks and potentially far away in time, wouldn't we want
> >>to see all the warnings? Would that be feasible to implement?
> >
> >I was thinking about that as well some time ago but it was quite
> >hard to find a good enough API to tell when to warn again. The first
> >WARN_ON_ONCE should trigger for all different _code paths_ no matter
> >how frequently they appear to catch all the buggy callers. The second
> >one would benefit from a new warning after min_free_kbytes was updated
> >because it would tell the administrator that the last update was not
> >sufficient for the workload.
> 
> Hm, what about adding a flag to the struct alloc_context, so that when the
> particular allocation attempt emits the warning, it sets a flag in the
> alloc_context so that it won't emit them again as long as it keeps looping
> and attempting oom. Other allocations will warn independently.

That could still trigger a flood of messages. Say you have many
concurrent users from the same call path...
 
I am not really sure making the code more complicating for this warning
is really worth it. If anything we can use ratelimited variant.

> We could also print the same info as the "allocation failed" warnings do,
> since it's very similar, except we can't fail - but the admin/bug reporter
> should be interested in the same details as for an allocation failure that
> is allowed to fail. But it's also true that we have probably just printed
> the info during out_of_memory()... except when we skipped that for some
> reason?

The first WARN_ON_ONCE happens when OOM killer doesn't trigger so a
memory situation might be worth considering. The later one might have
seen the OOM report which is the likely case. So if anyting the first
one should dump the info.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
  2015-11-23 10:13       ` Michal Hocko
@ 2015-11-23 21:26         ` David Rientjes
  2015-11-24  9:47           ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2015-11-23 21:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, Andrew Morton, Johannes Weiner, Andrea Arcangeli,
	Mel Gorman, linux-mm, LKML

On Mon, 23 Nov 2015, Michal Hocko wrote:

> > >>>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > >>>index 8034909faad2..d30bce9d7ac8 100644
> > >>>--- a/mm/page_alloc.c
> > >>>+++ b/mm/page_alloc.c
> > >>>@@ -2766,8 +2766,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> > >>>  			goto out;
> > >>>  	}
> > >>>  	/* Exhausted what can be done so it's blamo time */
> > >>>-	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
> > >>>+	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> > >>>  		*did_some_progress = 1;
> > >>>+
> > >>>+		if (gfp_mask & __GFP_NOFAIL) {
> > >>>+			page = get_page_from_freelist(gfp_mask, order,
> > >>>+					ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac);
> > >>>+			WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation."
> > >>>+				    " Consider increasing min_free_kbytes.\n");
> > >>
> > >>It seems redundant to me to keep the WARN_ON_ONCE also above in the if () part?
> > >
> > >They are warning about two different things. The first one catches a
> > >buggy code which uses __GFP_NOFAIL from oom disabled context while the
> > 
> > Ah, I see, I misinterpreted what the return values of out_of_memory() mean.
> > But now that I look at its code, it seems to only return false when
> > oom_killer_disabled is set to true. Which is a global thing and nothing to
> > do with the context of the __GFP_NOFAIL allocation?
> 
> I am not sure I follow you here. The point of the warning is to warn
> when the oom killer is disbaled (out_of_memory returns false) _and_ the
> request is __GFP_NOFAIL because we simply cannot guarantee any forward
> progress and just a use of the allocation flag is not supproted.
> 

I don't think the WARN_ONCE() above is helpful for a few reasons:

 - it suggests that min_free_kbytes is the best way to work around such 
   issues and gives kernel developers a free pass to just say "raise
   min_free_kbytes" rather than reducing their reliance on __GFP_NOFAIL,

 - raising min_free_kbytes is not immediately actionable without memory
   freeing to fix any oom issue, and

 - it relies on the earlier warning to dump the state of memory and 
   doesn't add any significant information to help understand how seperate
   occurrences are similar or different.

I think the WARN_ONCE() should just be removed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
  2015-11-23 21:26         ` David Rientjes
@ 2015-11-24  9:47           ` Michal Hocko
  2015-11-24 16:26             ` Johannes Weiner
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2015-11-24  9:47 UTC (permalink / raw)
  To: David Rientjes
  Cc: Vlastimil Babka, Andrew Morton, Johannes Weiner, Andrea Arcangeli,
	Mel Gorman, linux-mm, LKML

On Mon 23-11-15 13:26:49, David Rientjes wrote:
> On Mon, 23 Nov 2015, Michal Hocko wrote:
[...]
> > I am not sure I follow you here. The point of the warning is to warn
> > when the oom killer is disbaled (out_of_memory returns false) _and_ the
> > request is __GFP_NOFAIL because we simply cannot guarantee any forward
> > progress and just a use of the allocation flag is not supproted.
> > 
> 
> I don't think the WARN_ONCE() above is helpful for a few reasons:
> 
>  - it suggests that min_free_kbytes is the best way to work around such 
>    issues and gives kernel developers a free pass to just say "raise
>    min_free_kbytes" rather than reducing their reliance on __GFP_NOFAIL,

I disagree. Users are quite sensitive to warnings with backtraces in the
log from my experience and they report them. And while the log shows the
code path which triggers the issue which can help us to change the code
it also gives a useful hint on how to reduce this issue until we are
able to either fix a bug or a permanent configuration if we are not able
to get rid of it for whatever reason.

Besides that there is no other reliable warning that we are getting
_really_ short on memory unlike when the allocation failure is
allowed. OOM killer report might be missing because there was no actual
killing happening.
 
>  - raising min_free_kbytes is not immediately actionable without memory
>    freeing to fix any oom issue, and

true but it can be done to reduce chances for the issue to reappear.

>  - it relies on the earlier warning to dump the state of memory and 
>    doesn't add any significant information to help understand how seperate
>    occurrences are similar or different.

The information is quite valuable even without OOM killer report IMHO.
 
> I think the WARN_ONCE() should just be removed.

I do not insist on keeping it but I really think it might be useful
while it doesn't seem to cause any confusion IMHO. So unless there is a
strong reason to not include it I would prefer keeping it.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
  2015-11-24  9:47           ` Michal Hocko
@ 2015-11-24 16:26             ` Johannes Weiner
  2015-11-24 17:02               ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Weiner @ 2015-11-24 16:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Rientjes, Vlastimil Babka, Andrew Morton, Andrea Arcangeli,
	Mel Gorman, linux-mm, LKML

On Tue, Nov 24, 2015 at 10:47:09AM +0100, Michal Hocko wrote:
> Besides that there is no other reliable warning that we are getting
> _really_ short on memory unlike when the allocation failure is
> allowed. OOM killer report might be missing because there was no actual
> killing happening.

This is why I would like to see that warning generalized, and not just
for __GFP_NOFAIL. We have allocations other than explicit __GFP_NOFAIL
that will loop forever in the allocator, and when this deadlocks the
machine all we see is other tasks hanging, but not the culprit. If we
were to get a backtrace of some task in the allocator that is known to
hold locks, suddenly all the other hung tasks will make sense, and it
will clearly distinguish such an allocator deadlock from other issues.

Do you remember the patch you proposed at LSF about failing requests
after looping a certain (configurable) number of times? Well, instead
of failing them, it would be good to start WARNING after a certain #
of loops when we know we won't quit (implicit or explicit NOFAIL).

[ Kind of like fs/xfs/kmem::kmem_alloc() does, only that that is
  currently dead code due to our looping inside the allocator. ]

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
  2015-11-24 16:26             ` Johannes Weiner
@ 2015-11-24 17:02               ` Michal Hocko
  2015-11-24 19:57                 ` Johannes Weiner
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2015-11-24 17:02 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: David Rientjes, Vlastimil Babka, Andrew Morton, Andrea Arcangeli,
	Mel Gorman, linux-mm, LKML

On Tue 24-11-15 11:26:04, Johannes Weiner wrote:
> On Tue, Nov 24, 2015 at 10:47:09AM +0100, Michal Hocko wrote:
> > Besides that there is no other reliable warning that we are getting
> > _really_ short on memory unlike when the allocation failure is
> > allowed. OOM killer report might be missing because there was no actual
> > killing happening.
> 
> This is why I would like to see that warning generalized, and not just
> for __GFP_NOFAIL. We have allocations other than explicit __GFP_NOFAIL
> that will loop forever in the allocator,

Yes but does it make sense to warn for all of them? Wouldn't it be
sufficient to warn about those which cannot allocate anything even
though they are doing ALLOC_NO_WATERMARKS? We could still hint the
administrator to increase min_free_kbytes for his particular workload.
Such a situation should be really exceptional and warn_once should be
sufficient.

> and when this deadlocks the
> machine all we see is other tasks hanging, but not the culprit. If we
> were to get a backtrace of some task in the allocator that is known to
> hold locks, suddenly all the other hung tasks will make sense, and it
> will clearly distinguish such an allocator deadlock from other issues.

Tetsuo was suggesting a more sophisticated infrastructure for tracking
allocations [1] which take too long without making progress. I haven't
seen his patch because I was too busy with other stuff but maybe this is
what you would like to see?

Anyway I would like to make some progress on this patch. Do you think
that it would be acceptable in the current form without the warning or
you preffer a different way?

[1] http://lkml.kernel.org/r/201510182105.AGA00839.FHVFFStLQOMOOJ%40I-love.SAKURA.ne.jp
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
  2015-11-24 17:02               ` Michal Hocko
@ 2015-11-24 19:57                 ` Johannes Weiner
  2015-11-25  9:33                   ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Weiner @ 2015-11-24 19:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Rientjes, Vlastimil Babka, Andrew Morton, Andrea Arcangeli,
	Mel Gorman, linux-mm, LKML

On Tue, Nov 24, 2015 at 06:02:39PM +0100, Michal Hocko wrote:
> On Tue 24-11-15 11:26:04, Johannes Weiner wrote:
> > On Tue, Nov 24, 2015 at 10:47:09AM +0100, Michal Hocko wrote:
> > > Besides that there is no other reliable warning that we are getting
> > > _really_ short on memory unlike when the allocation failure is
> > > allowed. OOM killer report might be missing because there was no actual
> > > killing happening.
> > 
> > This is why I would like to see that warning generalized, and not just
> > for __GFP_NOFAIL. We have allocations other than explicit __GFP_NOFAIL
> > that will loop forever in the allocator,
> 
> Yes but does it make sense to warn for all of them? Wouldn't it be
> sufficient to warn about those which cannot allocate anything even
> though they are doing ALLOC_NO_WATERMARKS?

Why is it important whether they can do ALLOC_NO_WATERMARKS or not?

I'm worried about all those that can loop forever with locks held.

> > and when this deadlocks the
> > machine all we see is other tasks hanging, but not the culprit. If we
> > were to get a backtrace of some task in the allocator that is known to
> > hold locks, suddenly all the other hung tasks will make sense, and it
> > will clearly distinguish such an allocator deadlock from other issues.
> 
> Tetsuo was suggesting a more sophisticated infrastructure for tracking
> allocations [1] which take too long without making progress. I haven't
> seen his patch because I was too busy with other stuff but maybe this is
> what you would like to see?

That seems a bit excessive. I was thinking something more like this:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 05ef7fb..fbfc581 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3004,6 +3004,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	enum migrate_mode migration_mode = MIGRATE_ASYNC;
 	bool deferred_compaction = false;
 	int contended_compaction = COMPACT_CONTENDED_NONE;
+	unsigned int nr_tries = 0;
 
 	/*
 	 * In the slowpath, we sanity check order to avoid ever trying to
@@ -3033,6 +3034,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto nopage;
 
 retry:
+	if (++nr_retries % 100 == 0)
+		warn_alloc_failed(gfp_mask, order, "Potential GFP deadlock\n");
+
 	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
 		wake_all_kswapds(order, ac);
 
> Anyway I would like to make some progress on this patch. Do you think
> that it would be acceptable in the current form without the warning or
> you preffer a different way?

Oh, I have nothing against your patch, please go ahead with it. I just
wondered out loud when you proposed a warning about deadlocking NOFAIL
allocations but limited it to explicit __GFP_NOFAIL allocations, when
those obviously aren't the only ones that can deadlock in that way.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
  2015-11-24 19:57                 ` Johannes Weiner
@ 2015-11-25  9:33                   ` Michal Hocko
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2015-11-25  9:33 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: David Rientjes, Vlastimil Babka, Andrew Morton, Andrea Arcangeli,
	Mel Gorman, linux-mm, LKML

On Tue 24-11-15 14:57:10, Johannes Weiner wrote:
> On Tue, Nov 24, 2015 at 06:02:39PM +0100, Michal Hocko wrote:
> > On Tue 24-11-15 11:26:04, Johannes Weiner wrote:
> > > On Tue, Nov 24, 2015 at 10:47:09AM +0100, Michal Hocko wrote:
> > > > Besides that there is no other reliable warning that we are getting
> > > > _really_ short on memory unlike when the allocation failure is
> > > > allowed. OOM killer report might be missing because there was no actual
> > > > killing happening.
> > > 
> > > This is why I would like to see that warning generalized, and not just
> > > for __GFP_NOFAIL. We have allocations other than explicit __GFP_NOFAIL
> > > that will loop forever in the allocator,
> > 
> > Yes but does it make sense to warn for all of them? Wouldn't it be
> > sufficient to warn about those which cannot allocate anything even
> > though they are doing ALLOC_NO_WATERMARKS?
> 
> Why is it important whether they can do ALLOC_NO_WATERMARKS or not?

Well, the idea was that ALLOC_NO_WATERMARKS failures mean that memory
reserves are not sufficient to handle given workload. min_free_kbytes
is auto-tuned and it might be not sufficient - especially now that we
are adding a new class of consumers of the reserves. I find a warning
as an appropriate way to tell administrator that the auto-tuning was
too optimistic for the particular load.

> I'm worried about all those that can loop forever with locks held.

I can see your point here but I think that looping endlessly without
any progress is a different class of issue. It is hard to tune for it.
You can change tunning and still can end up looping because the lock
vs. reclaim dependencies will be still there.

> > > and when this deadlocks the
> > > machine all we see is other tasks hanging, but not the culprit. If we
> > > were to get a backtrace of some task in the allocator that is known to
> > > hold locks, suddenly all the other hung tasks will make sense, and it
> > > will clearly distinguish such an allocator deadlock from other issues.
> > 
> > Tetsuo was suggesting a more sophisticated infrastructure for tracking
> > allocations [1] which take too long without making progress. I haven't
> > seen his patch because I was too busy with other stuff but maybe this is
> > what you would like to see?
> 
> That seems a bit excessive. I was thinking something more like this:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 05ef7fb..fbfc581 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3004,6 +3004,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	enum migrate_mode migration_mode = MIGRATE_ASYNC;
>  	bool deferred_compaction = false;
>  	int contended_compaction = COMPACT_CONTENDED_NONE;
> +	unsigned int nr_tries = 0;
>  
>  	/*
>  	 * In the slowpath, we sanity check order to avoid ever trying to
> @@ -3033,6 +3034,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		goto nopage;
>  
>  retry:
> +	if (++nr_retries % 100 == 0)
> +		warn_alloc_failed(gfp_mask, order, "Potential GFP deadlock\n");
> +

I am not against this in principle. It might be too noisy but
warn_alloc_failed is throttled already so it should handle too many
parallel requests already. I still think that ALLOC_NO_WATERMARKS
failures deserve a special treatment because we can tune for that.
Care to send a patch?

>  	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
>  		wake_all_kswapds(order, ac);
>  
> > Anyway I would like to make some progress on this patch. Do you think
> > that it would be acceptable in the current form without the warning or
> > you preffer a different way?
> 
> Oh, I have nothing against your patch, please go ahead with it. I just
> wondered out loud when you proposed a warning about deadlocking NOFAIL
> allocations but limited it to explicit __GFP_NOFAIL allocations, when
> those obviously aren't the only ones that can deadlock in that way.

As mentioned above I have added it merely because this would be a new
consumer of the reserves which might lead to its depletion without
an explicit warning. But the more I think about that the more I am
convinced that this should be generalized to ALLOC_NO_WATERMARKS.

I will remove the warning from the patch and prepare a separate one
which will warn ALLOC_NO_WATERMARKS so that we can discuss that
separately.

Thanks

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-11-25  9:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-11 13:48 [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves mhocko
2015-11-11 15:54 ` Johannes Weiner
2015-11-12  8:51   ` Michal Hocko
2015-11-22 12:55 ` Vlastimil Babka
2015-11-23  9:29   ` Michal Hocko
2015-11-23  9:43     ` Vlastimil Babka
2015-11-23 10:13       ` Michal Hocko
2015-11-23 21:26         ` David Rientjes
2015-11-24  9:47           ` Michal Hocko
2015-11-24 16:26             ` Johannes Weiner
2015-11-24 17:02               ` Michal Hocko
2015-11-24 19:57                 ` Johannes Weiner
2015-11-25  9:33                   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).