* [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL
@ 2009-05-09 22:46 David Rientjes
2009-05-10 23:42 ` Rik van Riel
` (3 more replies)
0 siblings, 4 replies; 20+ messages in thread
From: David Rientjes @ 2009-05-09 22:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Peter Zijlstra, Nick Piggin, Christoph Lameter,
Dave Hansen, linux-kernel
The oom killer must be invoked regardless of the order if the allocation
is __GFP_NOFAIL, otherwise it will loop forever when reclaim fails to
free some memory.
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
mm/page_alloc.c | 16 ++++++----------
1 files changed, 6 insertions(+), 10 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1547,7 +1547,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
goto out;
/* The OOM killer will not help higher order allocs */
- if (order > PAGE_ALLOC_COSTLY_ORDER)
+ if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_NOFAIL))
goto out;
/* Exhausted what can be done so it's blamo time */
@@ -1763,16 +1763,12 @@ rebalance:
migratetype);
if (page)
goto got_pg;
+ if ((gfp_mask & __GFP_NOFAIL) ||
+ order <= PAGE_ALLOC_COSTLY_ORDER)
+ goto restart;
- /*
- * The OOM killer does not trigger for high-order allocations
- * but if no progress is being made, there are no other
- * options and retrying is unlikely to help
- */
- if (order > PAGE_ALLOC_COSTLY_ORDER)
- goto nopage;
-
- goto restart;
+ /* No freeing is possible so just fail */
+ goto nopage;
}
}
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-09 22:46 [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL David Rientjes @ 2009-05-10 23:42 ` Rik van Riel 2009-05-11 7:29 ` Minchan Kim ` (2 subsequent siblings) 3 siblings, 0 replies; 20+ messages in thread From: Rik van Riel @ 2009-05-10 23:42 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Mel Gorman, Peter Zijlstra, Nick Piggin, Christoph Lameter, Dave Hansen, linux-kernel David Rientjes wrote: > The oom killer must be invoked regardless of the order if the allocation > is __GFP_NOFAIL, otherwise it will loop forever when reclaim fails to > free some memory. > > Cc: Mel Gorman <mel@csn.ul.ie> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> > Cc: Nick Piggin <npiggin@suse.de> > Cc: Christoph Lameter <cl@linux-foundation.org> > Cc: Dave Hansen <dave@linux.vnet.ibm.com> > Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Rik van Riel <riel@redhat.com> -- All rights reversed. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-09 22:46 [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL David Rientjes 2009-05-10 23:42 ` Rik van Riel @ 2009-05-11 7:29 ` Minchan Kim 2009-05-11 8:40 ` David Rientjes 2009-05-11 10:40 ` Mel Gorman 2009-05-11 20:40 ` Andrew Morton 3 siblings, 1 reply; 20+ messages in thread From: Minchan Kim @ 2009-05-11 7:29 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Mel Gorman, Peter Zijlstra, Nick Piggin, Christoph Lameter, Dave Hansen, linux-kernel Hi, David. On Sat, 9 May 2009 15:46:39 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > The oom killer must be invoked regardless of the order if the allocation > is __GFP_NOFAIL, otherwise it will loop forever when reclaim fails to > free some memory. > Hmm.. if __alloc_pages_may_oom fail to allocate free page due to order > PAGE_ALLOC_COSTRY_ORDER, It will go to nopage label in __alloc_pages_slowpath. Then it will show the page allocation failure warning and will return. Retrying depends on caller. So, I think it won't loop forever. Do I miss something ? In addition, the OOM killer can help for getting the high order pages ? -- Kinds Regards Minchan Kim ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-11 7:29 ` Minchan Kim @ 2009-05-11 8:40 ` David Rientjes 2009-05-11 9:12 ` Minchan Kim 0 siblings, 1 reply; 20+ messages in thread From: David Rientjes @ 2009-05-11 8:40 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, Mel Gorman, Peter Zijlstra, Nick Piggin, Christoph Lameter, Dave Hansen, linux-kernel On Mon, 11 May 2009, Minchan Kim wrote: > Hmm.. if __alloc_pages_may_oom fail to allocate free page due to order > PAGE_ALLOC_COSTRY_ORDER, > > It will go to nopage label in __alloc_pages_slowpath. > Then it will show the page allocation failure warning and will return. > Retrying depends on caller. > Correct. > So, I think it won't loop forever. > Do I miss something ? > __GFP_NOFAIL allocations shouldn't fail, that's the point of the gfp flag. So failing without attempting to free some memory is the wrong thing to do. > In addition, the OOM killer can help for getting the high order pages ? > Sure, if it selects a task that will free a lot of memory, which is it's goal. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-11 8:40 ` David Rientjes @ 2009-05-11 9:12 ` Minchan Kim 2009-05-11 11:21 ` Minchan Kim 0 siblings, 1 reply; 20+ messages in thread From: Minchan Kim @ 2009-05-11 9:12 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Mel Gorman, Peter Zijlstra, Nick Piggin, Christoph Lameter, Dave Hansen, linux-kernel On Mon, May 11, 2009 at 5:40 PM, David Rientjes <rientjes@google.com> wrote: > On Mon, 11 May 2009, Minchan Kim wrote: > >> Hmm.. if __alloc_pages_may_oom fail to allocate free page due to order > PAGE_ALLOC_COSTRY_ORDER, >> >> It will go to nopage label in __alloc_pages_slowpath. >> Then it will show the page allocation failure warning and will return. >> Retrying depends on caller. >> > > Correct. > >> So, I think it won't loop forever. >> Do I miss something ? >> > > __GFP_NOFAIL allocations shouldn't fail, that's the point of the gfp flag. > So failing without attempting to free some memory is the wrong thing to > do. Thanks for quick reply. I was confused by your description. I thought you suggested we have to prevent loop forever. > >> In addition, the OOM killer can help for getting the high order pages ? >> > > Sure, if it selects a task that will free a lot of memory, which is it's > goal. > How do we know any task have a lot of memory ? If we select wrong task and kill one ? I have a concern about innocent task. -- Kinds regards, Minchan Kim ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-11 9:12 ` Minchan Kim @ 2009-05-11 11:21 ` Minchan Kim 2009-05-11 13:38 ` Mel Gorman 0 siblings, 1 reply; 20+ messages in thread From: Minchan Kim @ 2009-05-11 11:21 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Mel Gorman, Peter Zijlstra, Nick Piggin, Christoph Lameter, Dave Hansen, linux-kernel On Mon, May 11, 2009 at 6:12 PM, Minchan Kim <minchan.kim@gmail.com> wrote: > On Mon, May 11, 2009 at 5:40 PM, David Rientjes <rientjes@google.com> wrote: >> On Mon, 11 May 2009, Minchan Kim wrote: >> >>> Hmm.. if __alloc_pages_may_oom fail to allocate free page due to order > PAGE_ALLOC_COSTRY_ORDER, >>> >>> It will go to nopage label in __alloc_pages_slowpath. >>> Then it will show the page allocation failure warning and will return. >>> Retrying depends on caller. >>> >> >> Correct. >> >>> So, I think it won't loop forever. >>> Do I miss something ? >>> >> >> __GFP_NOFAIL allocations shouldn't fail, that's the point of the gfp flag. >> So failing without attempting to free some memory is the wrong thing to >> do. > > Thanks for quick reply. > I was confused by your description. > I thought you suggested we have to prevent loop forever. > >> >>> In addition, the OOM killer can help for getting the high order pages ? >>> >> >> Sure, if it selects a task that will free a lot of memory, which is it's >> goal. >> > > How do we know any task have a lot of memory ? > If we select wrong task and kill one ? > > I have a concern about innocent task. Now, I look over __out_of_memory. For selecting better tasks in case of PAGE_ALLOC_COSTRY_ORDER, How about increasing score of task which have VM_HUGETLB vma in badness ? > -- > Kinds regards, > Minchan Kim > -- Kinds regards, Minchan Kim ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-11 11:21 ` Minchan Kim @ 2009-05-11 13:38 ` Mel Gorman 2009-05-11 14:00 ` Minchan Kim 0 siblings, 1 reply; 20+ messages in thread From: Mel Gorman @ 2009-05-11 13:38 UTC (permalink / raw) To: Minchan Kim Cc: David Rientjes, Andrew Morton, Peter Zijlstra, Nick Piggin, Christoph Lameter, Dave Hansen, linux-kernel On Mon, May 11, 2009 at 08:21:21PM +0900, Minchan Kim wrote: > On Mon, May 11, 2009 at 6:12 PM, Minchan Kim <minchan.kim@gmail.com> wrote: > > On Mon, May 11, 2009 at 5:40 PM, David Rientjes <rientjes@google.com> wrote: > >> On Mon, 11 May 2009, Minchan Kim wrote: > >> > >>> Hmm.. if __alloc_pages_may_oom fail to allocate free page due to order > PAGE_ALLOC_COSTRY_ORDER, > >>> > >>> It will go to nopage label in __alloc_pages_slowpath. > >>> Then it will show the page allocation failure warning and will return. > >>> Retrying depends on caller. > >>> > >> > >> Correct. > >> > >>> So, I think it won't loop forever. > >>> Do I miss something ? > >>> > >> > >> __GFP_NOFAIL allocations shouldn't fail, that's the point of the gfp flag. > >> So failing without attempting to free some memory is the wrong thing to > >> do. > > > > Thanks for quick reply. > > I was confused by your description. > > I thought you suggested we have to prevent loop forever. > > > >> > >>> In addition, the OOM killer can help for getting the high order pages ? > >>> > >> > >> Sure, if it selects a task that will free a lot of memory, which is it's > >> goal. > >> > > > > How do we know any task have a lot of memory ? > > If we select wrong task and kill one ? > > > > I have a concern about innocent task. > > Now, I look over __out_of_memory. > For selecting better tasks in case of PAGE_ALLOC_COSTRY_ORDER, How > about increasing score of task which have VM_HUGETLB vma in badness ? > That is unjustified. It penalises a process even if it only allocated one hugepage and it is not a reflection of how much memory the process is using or how badly behaved it is. Even worse, if the huge page was allocated from the static hugepage pool then the hugepages are freed to the hugepage pool and not the page allocator when the process is killed. This means that killing a process using hugepages does not necessarily help applications requiring more memory unless they also want hugepages. However, a hugepage allocation will not trigger the OOM killer so killing processes using hugepages still does not help. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-11 13:38 ` Mel Gorman @ 2009-05-11 14:00 ` Minchan Kim 2009-05-11 19:32 ` David Rientjes 2009-05-11 21:32 ` Mel Gorman 0 siblings, 2 replies; 20+ messages in thread From: Minchan Kim @ 2009-05-11 14:00 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, Andrew Morton, Peter Zijlstra, Nick Piggin, Christoph Lameter, Dave Hansen, linux-kernel Hi, Mel. On Mon, May 11, 2009 at 10:38 PM, Mel Gorman <mel@csn.ul.ie> wrote: > On Mon, May 11, 2009 at 08:21:21PM +0900, Minchan Kim wrote: >> On Mon, May 11, 2009 at 6:12 PM, Minchan Kim <minchan.kim@gmail.com> wrote: >> > On Mon, May 11, 2009 at 5:40 PM, David Rientjes <rientjes@google.com> wrote: >> >> On Mon, 11 May 2009, Minchan Kim wrote: >> >> >> >>> Hmm.. if __alloc_pages_may_oom fail to allocate free page due to order > PAGE_ALLOC_COSTRY_ORDER, >> >>> >> >>> It will go to nopage label in __alloc_pages_slowpath. >> >>> Then it will show the page allocation failure warning and will return. >> >>> Retrying depends on caller. >> >>> >> >> >> >> Correct. >> >> >> >>> So, I think it won't loop forever. >> >>> Do I miss something ? >> >>> >> >> >> >> __GFP_NOFAIL allocations shouldn't fail, that's the point of the gfp flag. >> >> So failing without attempting to free some memory is the wrong thing to >> >> do. >> > >> > Thanks for quick reply. >> > I was confused by your description. >> > I thought you suggested we have to prevent loop forever. >> > >> >> >> >>> In addition, the OOM killer can help for getting the high order pages ? >> >>> >> >> >> >> Sure, if it selects a task that will free a lot of memory, which is it's >> >> goal. >> >> >> > >> > How do we know any task have a lot of memory ? >> > If we select wrong task and kill one ? >> > >> > I have a concern about innocent task. >> >> Now, I look over __out_of_memory. >> For selecting better tasks in case of PAGE_ALLOC_COSTRY_ORDER, How >> about increasing score of task which have VM_HUGETLB vma in badness ? >> > > That is unjustified. It penalises a process even if it only allocated one > hugepage and it is not a reflection of how much memory the process is using > or how badly behaved it is. > Even worse, if the huge page was allocated from the static hugepage pool then > the hugepages are freed to the hugepage pool and not the page allocator when > the process is killed. This means that killing a process using hugepages > does not necessarily help applications requiring more memory unless they > also want hugepages. However, a hugepage allocation will not trigger the > OOM killer so killing processes using hugepages still does not help. Thanks for pointing me. In fact, I expect your great answer. :) So, how do we prevent innocent task killing for allocation of high order page ? I think it is trade off. but at least, we have been prevent it until now. But this patch increases the probability of innocent task killing. Is GFP_NOFAIL's early bailout more important than killing of innocent task ? I am not sure. > -- > Mel Gorman > Part-time Phd Student Linux Technology Center > University of Limerick IBM Dublin Software Lab > -- Kinds regards, Minchan Kim ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-11 14:00 ` Minchan Kim @ 2009-05-11 19:32 ` David Rientjes 2009-05-11 23:48 ` Minchan Kim 2009-05-11 21:32 ` Mel Gorman 1 sibling, 1 reply; 20+ messages in thread From: David Rientjes @ 2009-05-11 19:32 UTC (permalink / raw) To: Minchan Kim Cc: Mel Gorman, Andrew Morton, Peter Zijlstra, Nick Piggin, Christoph Lameter, Dave Hansen, linux-kernel On Mon, 11 May 2009, Minchan Kim wrote: > But this patch increases the probability of innocent task killing. > Is GFP_NOFAIL's early bailout more important than killing of innocent task ? > __GFP_NOFAIL's bailout is, by definition, a bug in the page allocator and this patch fixes it. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-11 19:32 ` David Rientjes @ 2009-05-11 23:48 ` Minchan Kim 0 siblings, 0 replies; 20+ messages in thread From: Minchan Kim @ 2009-05-11 23:48 UTC (permalink / raw) To: David Rientjes Cc: Mel Gorman, Andrew Morton, Peter Zijlstra, Nick Piggin, Christoph Lameter, Dave Hansen, linux-kernel On Tue, May 12, 2009 at 4:32 AM, David Rientjes <rientjes@google.com> wrote: > On Mon, 11 May 2009, Minchan Kim wrote: > >> But this patch increases the probability of innocent task killing. >> Is GFP_NOFAIL's early bailout more important than killing of innocent task ? >> > > __GFP_NOFAIL's bailout is, by definition, a bug in the page allocator and > this patch fixes it. > What I mean is If fortunately other process have a lot of memory is exited while looping, __GFP_NOFAIL can return. I have no objection about your patch. Just, I have a concern of killing innocent task. As Mel and Andrew pointed out, fundamentally, __GFOP_NOFAIL and high-order allocation itself is bad code. Okay. we have to insert WARN_ON properly. Thanks for good contribution. :) -- Kinds regards, Minchan Kim ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-11 14:00 ` Minchan Kim 2009-05-11 19:32 ` David Rientjes @ 2009-05-11 21:32 ` Mel Gorman 2009-05-11 23:41 ` Minchan Kim 1 sibling, 1 reply; 20+ messages in thread From: Mel Gorman @ 2009-05-11 21:32 UTC (permalink / raw) To: Minchan Kim Cc: David Rientjes, Andrew Morton, Peter Zijlstra, Nick Piggin, Christoph Lameter, Dave Hansen, linux-kernel On Mon, May 11, 2009 at 11:00:44PM +0900, Minchan Kim wrote: > Hi, Mel. > > On Mon, May 11, 2009 at 10:38 PM, Mel Gorman <mel@csn.ul.ie> wrote: > > On Mon, May 11, 2009 at 08:21:21PM +0900, Minchan Kim wrote: > >> On Mon, May 11, 2009 at 6:12 PM, Minchan Kim <minchan.kim@gmail.com> wrote: > >> > On Mon, May 11, 2009 at 5:40 PM, David Rientjes <rientjes@google.com> wrote: > >> >> On Mon, 11 May 2009, Minchan Kim wrote: > >> >> > >> >>> Hmm.. if __alloc_pages_may_oom fail to allocate free page due to order > PAGE_ALLOC_COSTRY_ORDER, > >> >>> > >> >>> It will go to nopage label in __alloc_pages_slowpath. > >> >>> Then it will show the page allocation failure warning and will return. > >> >>> Retrying depends on caller. > >> >>> > >> >> > >> >> Correct. > >> >> > >> >>> So, I think it won't loop forever. > >> >>> Do I miss something ? > >> >>> > >> >> > >> >> __GFP_NOFAIL allocations shouldn't fail, that's the point of the gfp flag. > >> >> So failing without attempting to free some memory is the wrong thing to > >> >> do. > >> > > >> > Thanks for quick reply. > >> > I was confused by your description. > >> > I thought you suggested we have to prevent loop forever. > >> > > >> >> > >> >>> In addition, the OOM killer can help for getting the high order pages ? > >> >>> > >> >> > >> >> Sure, if it selects a task that will free a lot of memory, which is it's > >> >> goal. > >> >> > >> > > >> > How do we know any task have a lot of memory ? > >> > If we select wrong task and kill one ? > >> > > >> > I have a concern about innocent task. > >> > >> Now, I look over __out_of_memory. > >> For selecting better tasks in case of PAGE_ALLOC_COSTRY_ORDER, How > >> about increasing score of task which have VM_HUGETLB vma in badness ? > >> > > > > That is unjustified. It penalises a process even if it only allocated one > > hugepage and it is not a reflection of how much memory the process is using > > or how badly behaved it is. > > Even worse, if the huge page was allocated from the static hugepage pool then > > the hugepages are freed to the hugepage pool and not the page allocator when > > the process is killed. This means that killing a process using hugepages > > does not necessarily help applications requiring more memory unless they > > also want hugepages. However, a hugepage allocation will not trigger the > > OOM killer so killing processes using hugepages still does not help. > > Thanks for pointing me. > In fact, I expect your great answer. :) > > So, how do we prevent innocent task killing for allocation of high order page ? Not by targetting users of hugepages anyway, that's for sure. My expectation normally for a high-order allocation failing is for the caller to recover from the situation gracefully. In the event it can't, the caller is running a major risk and I would question why it's __GFP_NOFAIL. I recognise that this is not much of an answer. I haven't read all the related threads so I don't know what application is depending so heavily on high-order allocations succeeding that it warrented __GFP_NOFAIL and couldn't be addressed in some other fashion like vmalloc(). Killing a process allocating huges will only help another process requiring hugepages. Unless dynamic hugepage pool resizing was used, the pages freed are not usable for normal high-order allocations so teaching the OOM killer to target those processes is unlikely to help solve whatever problem is being addressed. > I think it is trade off. but at least, we have been prevent it until now. > > But this patch increases the probability of innocent task killing. I think any increase in probability is minimal. When it gets down to it, there should be zero costly-high-order allocations that are also __GFP_NOFAIL. If anything, the patch would show up as OOM-kill pointing out what caller needs to be fixed as opposed to having apparently infinite loops in the page allocator. > Is GFP_NOFAIL's early bailout more important than killing of innocent task ? > In my opinion, yes, in the sense that a OOM-kill report is easier to diagnose than an infinite loop. > I am not sure. > > > -- > > Mel Gorman > > Part-time Phd Student Linux Technology Center > > University of Limerick IBM Dublin Software Lab > > > > > > -- > Kinds regards, > Minchan Kim > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-11 21:32 ` Mel Gorman @ 2009-05-11 23:41 ` Minchan Kim 0 siblings, 0 replies; 20+ messages in thread From: Minchan Kim @ 2009-05-11 23:41 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, Andrew Morton, Peter Zijlstra, Nick Piggin, Christoph Lameter, Dave Hansen, linux-kernel On Tue, May 12, 2009 at 6:32 AM, Mel Gorman <mel@csn.ul.ie> wrote: > On Mon, May 11, 2009 at 11:00:44PM +0900, Minchan Kim wrote: >> Hi, Mel. >> >> On Mon, May 11, 2009 at 10:38 PM, Mel Gorman <mel@csn.ul.ie> wrote: >> > On Mon, May 11, 2009 at 08:21:21PM +0900, Minchan Kim wrote: >> >> On Mon, May 11, 2009 at 6:12 PM, Minchan Kim <minchan.kim@gmail.com> wrote: >> >> > On Mon, May 11, 2009 at 5:40 PM, David Rientjes <rientjes@google.com> wrote: >> >> >> On Mon, 11 May 2009, Minchan Kim wrote: >> >> >> >> >> >>> Hmm.. if __alloc_pages_may_oom fail to allocate free page due to order > PAGE_ALLOC_COSTRY_ORDER, >> >> >>> >> >> >>> It will go to nopage label in __alloc_pages_slowpath. >> >> >>> Then it will show the page allocation failure warning and will return. >> >> >>> Retrying depends on caller. >> >> >>> >> >> >> >> >> >> Correct. >> >> >> >> >> >>> So, I think it won't loop forever. >> >> >>> Do I miss something ? >> >> >>> >> >> >> >> >> >> __GFP_NOFAIL allocations shouldn't fail, that's the point of the gfp flag. >> >> >> So failing without attempting to free some memory is the wrong thing to >> >> >> do. >> >> > >> >> > Thanks for quick reply. >> >> > I was confused by your description. >> >> > I thought you suggested we have to prevent loop forever. >> >> > >> >> >> >> >> >>> In addition, the OOM killer can help for getting the high order pages ? >> >> >>> >> >> >> >> >> >> Sure, if it selects a task that will free a lot of memory, which is it's >> >> >> goal. >> >> >> >> >> > >> >> > How do we know any task have a lot of memory ? >> >> > If we select wrong task and kill one ? >> >> > >> >> > I have a concern about innocent task. >> >> >> >> Now, I look over __out_of_memory. >> >> For selecting better tasks in case of PAGE_ALLOC_COSTRY_ORDER, How >> >> about increasing score of task which have VM_HUGETLB vma in badness ? >> >> >> > >> > That is unjustified. It penalises a process even if it only allocated one >> > hugepage and it is not a reflection of how much memory the process is using >> > or how badly behaved it is. >> > Even worse, if the huge page was allocated from the static hugepage pool then >> > the hugepages are freed to the hugepage pool and not the page allocator when >> > the process is killed. This means that killing a process using hugepages >> > does not necessarily help applications requiring more memory unless they >> > also want hugepages. However, a hugepage allocation will not trigger the >> > OOM killer so killing processes using hugepages still does not help. >> >> Thanks for pointing me. >> In fact, I expect your great answer. :) >> >> So, how do we prevent innocent task killing for allocation of high order page ? > > Not by targetting users of hugepages anyway, that's for sure. My expectation > normally for a high-order allocation failing is for the caller to recover > from the situation gracefully. In the event it can't, the caller is running > a major risk and I would question why it's __GFP_NOFAIL. I agree. > I recognise that this is not much of an answer. I haven't read all the > related threads so I don't know what application is depending so heavily on > high-order allocations succeeding that it warrented __GFP_NOFAIL and couldn't > be addressed in some other fashion like vmalloc(). > > Killing a process allocating huges will only help another process requiring > hugepages. Unless dynamic hugepage pool resizing was used, the pages freed > are not usable for normal high-order allocations so teaching the OOM > killer to target those processes is unlikely to help solve whatever > problem is being addressed. > >> I think it is trade off. but at least, we have been prevent it until now. >> >> But this patch increases the probability of innocent task killing. > > I think any increase in probability is minimal. When it gets down to it, there > should be zero costly-high-order allocations that are also __GFP_NOFAIL. If > anything, the patch would show up as OOM-kill pointing out what caller needs to > be fixed as opposed to having apparently infinite loops in the page allocator. OK. >> Is GFP_NOFAIL's early bailout more important than killing of innocent task ? >> > > In my opinion, yes, in the sense that a OOM-kill report is easier to diagnose > than an infinite loop. Yes. I agree, too. Thanks for great review, Mel. :) -- Kinds regards, Minchan Kim ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-09 22:46 [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL David Rientjes 2009-05-10 23:42 ` Rik van Riel 2009-05-11 7:29 ` Minchan Kim @ 2009-05-11 10:40 ` Mel Gorman 2009-05-11 19:37 ` David Rientjes 2009-05-11 20:40 ` Andrew Morton 3 siblings, 1 reply; 20+ messages in thread From: Mel Gorman @ 2009-05-11 10:40 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Peter Zijlstra, Nick Piggin, Christoph Lameter, Dave Hansen, linux-kernel On Sat, May 09, 2009 at 03:46:39PM -0700, David Rientjes wrote: > The oom killer must be invoked regardless of the order if the allocation > is __GFP_NOFAIL, otherwise it will loop forever when reclaim fails to > free some memory. > > Cc: Mel Gorman <mel@csn.ul.ie> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> > Cc: Nick Piggin <npiggin@suse.de> > Cc: Christoph Lameter <cl@linux-foundation.org> > Cc: Dave Hansen <dave@linux.vnet.ibm.com> > Signed-off-by: David Rientjes <rientjes@google.com> > --- > mm/page_alloc.c | 16 ++++++---------- > 1 files changed, 6 insertions(+), 10 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1547,7 +1547,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > goto out; > > /* The OOM killer will not help higher order allocs */ > - if (order > PAGE_ALLOC_COSTLY_ORDER) > + if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_NOFAIL)) > goto out; > > /* Exhausted what can be done so it's blamo time */ > @@ -1763,16 +1763,12 @@ rebalance: > migratetype); > if (page) > goto got_pg; > + if ((gfp_mask & __GFP_NOFAIL) || > + order <= PAGE_ALLOC_COSTLY_ORDER) > + goto restart; > Can this get a comment explaining the intention like the code you are removing had? Something like this? /* * The OOM killer only triggers for lower order allocation * requests and when __GFP_NOFAIL is specified to prevent * endlessly looping. Only retry the allocation if the OOM * killer was used. */ Otherwise, it looks ok. Acked-by: Mel Gorman <mel@csn.ul.ie> One related question, what is doing a costly-high-order allocation request with __GFP_NOFAIL specified? > - /* > - * The OOM killer does not trigger for high-order allocations > - * but if no progress is being made, there are no other > - * options and retrying is unlikely to help > - */ > - if (order > PAGE_ALLOC_COSTLY_ORDER) > - goto nopage; > - > - goto restart; > + /* No freeing is possible so just fail */ > + goto nopage; > } > } > > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-11 10:40 ` Mel Gorman @ 2009-05-11 19:37 ` David Rientjes 0 siblings, 0 replies; 20+ messages in thread From: David Rientjes @ 2009-05-11 19:37 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Peter Zijlstra, Nick Piggin, Christoph Lameter, Dave Hansen, linux-kernel On Mon, 11 May 2009, Mel Gorman wrote: > Can this get a comment explaining the intention like the code you are > removing had? Something like this? > > /* > * The OOM killer only triggers for lower order allocation > * requests and when __GFP_NOFAIL is specified to prevent > * endlessly looping. Only retry the allocation if the OOM > * killer was used. > */ > > Otherwise, it looks ok. > Sure, I'll add a comment to patch 11 in my latest series that changes the code again in this area. > One related question, what is doing a costly-high-order allocation request > with __GFP_NOFAIL specified? > Nothing to my knowledge, this is simply a bug fix before the big change in my series where we fail page allocations if reclaim fails and the oom killer fails, which previously hasn't been done (we'd loop forever oom killing tasks that may or may not respond). The definition of __GFP_NOFAIL does not make the guarantee of infinitely retrying for only specific ranges of orders. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-09 22:46 [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL David Rientjes ` (2 preceding siblings ...) 2009-05-11 10:40 ` Mel Gorman @ 2009-05-11 20:40 ` Andrew Morton 2009-05-12 12:42 ` Jens Axboe 3 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2009-05-11 20:40 UTC (permalink / raw) To: David Rientjes Cc: mel, a.p.zijlstra, npiggin, cl, dave, linux-kernel, Divy Le Ray, Jens Axboe On Sat, 9 May 2009 15:46:39 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > The oom killer must be invoked regardless of the order if the allocation > is __GFP_NOFAIL, otherwise it will loop forever when reclaim fails to > free some memory. Sigh. We're supposed to be deleting __GFP_NOFAIL. I added it as a way of easily finding lame error-handling-challenged callers which need to be fixed up. So of course we went and added lots more callers. y:/usr/src/linux-2.6.30-rc5> grep -rl GFP_NOFAIL . ./arch/x86/xen/mmu.c ./arch/sparc/kernel/mdesc.c ./mm/page_alloc.c ./mm/failslab.c ./block/cfq-iosched.c ./fs/bio-integrity.c ./fs/ntfs/ChangeLog ./fs/ntfs/malloc.h ./fs/reiserfs/journal.c ./fs/gfs2/meta_io.c ./fs/gfs2/rgrp.c ./fs/gfs2/dir.c ./fs/gfs2/log.c ./fs/jbd/transaction.c ./fs/jbd/journal.c ./fs/jbd2/transaction.c ./fs/jbd2/journal.c ./drivers/net/cxgb3/cxgb3_main.c ./drivers/net/cxgb3/cxgb3_offload.c ./include/linux/slab.h ./include/linux/gfp.h JBD (and hence JBD2) are the original sinners. That net driver should be taught to just handle the allocation failure, please. It's super-uber-bad to be using __GFP_NOFAIL in an IO scheduler! But maybe that's just a brainfart: /* * Inform the allocator of the fact that we will * just repeat this allocation if it fails, to allow * the allocator to do whatever it needs to attempt to * free memory. */ If "we will just repeat this allocation" means what it says then we should use __GFP_NORETRY here, then retry the allocation if it failed. But a) this risks getting stuck in a hot loop in CFQ and b) we really really don't want to be looping infinitely for memory relcaim down in the guts of the block layer! >From my reading, this function is called from get_request_wait(), via rq = get_request(q, rw_flags, bio, GFP_NOIO); so we can't even do pageout here. Jens, this all looks quite risky. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-11 20:40 ` Andrew Morton @ 2009-05-12 12:42 ` Jens Axboe 2009-05-12 13:05 ` Nick Piggin 0 siblings, 1 reply; 20+ messages in thread From: Jens Axboe @ 2009-05-12 12:42 UTC (permalink / raw) To: Andrew Morton Cc: David Rientjes, mel, a.p.zijlstra, npiggin, cl, dave, linux-kernel, Divy Le Ray On Mon, May 11 2009, Andrew Morton wrote: > On Sat, 9 May 2009 15:46:39 -0700 (PDT) > David Rientjes <rientjes@google.com> wrote: > > > The oom killer must be invoked regardless of the order if the allocation > > is __GFP_NOFAIL, otherwise it will loop forever when reclaim fails to > > free some memory. > > Sigh. We're supposed to be deleting __GFP_NOFAIL. I added it as a way > of easily finding lame error-handling-challenged callers which need to > be fixed up. So of course we went and added lots more callers. > > y:/usr/src/linux-2.6.30-rc5> grep -rl GFP_NOFAIL . > ./arch/x86/xen/mmu.c > ./arch/sparc/kernel/mdesc.c > ./mm/page_alloc.c > ./mm/failslab.c > ./block/cfq-iosched.c > ./fs/bio-integrity.c > ./fs/ntfs/ChangeLog > ./fs/ntfs/malloc.h > ./fs/reiserfs/journal.c > ./fs/gfs2/meta_io.c > ./fs/gfs2/rgrp.c > ./fs/gfs2/dir.c > ./fs/gfs2/log.c > ./fs/jbd/transaction.c > ./fs/jbd/journal.c > ./fs/jbd2/transaction.c > ./fs/jbd2/journal.c > ./drivers/net/cxgb3/cxgb3_main.c > ./drivers/net/cxgb3/cxgb3_offload.c > ./include/linux/slab.h > ./include/linux/gfp.h > > JBD (and hence JBD2) are the original sinners. > > That net driver should be taught to just handle the allocation failure, > please. > > > It's super-uber-bad to be using __GFP_NOFAIL in an IO scheduler! But maybe > that's just a brainfart: > > /* > * Inform the allocator of the fact that we will > * just repeat this allocation if it fails, to allow > * the allocator to do whatever it needs to attempt to > * free memory. > */ > > If "we will just repeat this allocation" means what it says then we > should use __GFP_NORETRY here, then retry the allocation if it failed. > But a) this risks getting stuck in a hot loop in CFQ and b) we really > really don't want to be looping infinitely for memory relcaim down in > the guts of the block layer! > > From my reading, this function is called from get_request_wait(), via > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > > so we can't even do pageout here. > > Jens, this all looks quite risky. I agree, it's not all that pretty. That particular piece of code has been there since v3 of CFQ at least, probably even earlier. So it is 2-3 years old at least. I'll see what I can do about improving it. There's no easy solution to this, we can't do any sort of pool backing since cfqq allocations could persist forever. So it's probably better to handle the failure to allocate by just stuffing the request directly on the dispatch queue and just forget about the whole thing. If it happens only once in a blue moon, it doesn't matter. If it happens regularly, not good... -- Jens Axboe ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-12 12:42 ` Jens Axboe @ 2009-05-12 13:05 ` Nick Piggin 2009-05-12 16:37 ` Jens Axboe 0 siblings, 1 reply; 20+ messages in thread From: Nick Piggin @ 2009-05-12 13:05 UTC (permalink / raw) To: Jens Axboe Cc: Andrew Morton, David Rientjes, mel, a.p.zijlstra, cl, dave, linux-kernel, Divy Le Ray On Tue, May 12, 2009 at 02:42:02PM +0200, Jens Axboe wrote: > On Mon, May 11 2009, Andrew Morton wrote: > > On Sat, 9 May 2009 15:46:39 -0700 (PDT) > > David Rientjes <rientjes@google.com> wrote: > > > > > The oom killer must be invoked regardless of the order if the allocation > > > is __GFP_NOFAIL, otherwise it will loop forever when reclaim fails to > > > free some memory. > > > > Sigh. We're supposed to be deleting __GFP_NOFAIL. I added it as a way > > of easily finding lame error-handling-challenged callers which need to > > be fixed up. So of course we went and added lots more callers. > > > > y:/usr/src/linux-2.6.30-rc5> grep -rl GFP_NOFAIL . > > ./fs/bio-integrity.c This is no good either, it seems to be in the bio submission path. It needs a mempool or something. It has a dead code "fallback" that returns an error, but I suspect that's not really acceptable. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-12 13:05 ` Nick Piggin @ 2009-05-12 16:37 ` Jens Axboe 2009-05-12 16:49 ` Nick Piggin 0 siblings, 1 reply; 20+ messages in thread From: Jens Axboe @ 2009-05-12 16:37 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, David Rientjes, mel, a.p.zijlstra, cl, dave, linux-kernel, Divy Le Ray On Tue, May 12 2009, Nick Piggin wrote: > On Tue, May 12, 2009 at 02:42:02PM +0200, Jens Axboe wrote: > > On Mon, May 11 2009, Andrew Morton wrote: > > > On Sat, 9 May 2009 15:46:39 -0700 (PDT) > > > David Rientjes <rientjes@google.com> wrote: > > > > > > > The oom killer must be invoked regardless of the order if the allocation > > > > is __GFP_NOFAIL, otherwise it will loop forever when reclaim fails to > > > > free some memory. > > > > > > Sigh. We're supposed to be deleting __GFP_NOFAIL. I added it as a way > > > of easily finding lame error-handling-challenged callers which need to > > > be fixed up. So of course we went and added lots more callers. > > > > > > y:/usr/src/linux-2.6.30-rc5> grep -rl GFP_NOFAIL . > > > ./fs/bio-integrity.c > > This is no good either, it seems to be in the bio submission path. > > It needs a mempool or something. mempool cannot help here, since the allocation is tied to the process (and IO) life time. > It has a dead code "fallback" that returns an error, but I suspect that's > not really acceptable. It's not that difficult to handle an error there, it just means that we lose any process association with that request. It's mostly making sure that all the bits and pieces deal with that correctly, but it should not be very hard. -- Jens Axboe ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-12 16:37 ` Jens Axboe @ 2009-05-12 16:49 ` Nick Piggin 2009-05-12 17:35 ` Jens Axboe 0 siblings, 1 reply; 20+ messages in thread From: Nick Piggin @ 2009-05-12 16:49 UTC (permalink / raw) To: Jens Axboe Cc: Andrew Morton, David Rientjes, mel, a.p.zijlstra, cl, dave, linux-kernel, Divy Le Ray On Tue, May 12, 2009 at 06:37:30PM +0200, Jens Axboe wrote: > On Tue, May 12 2009, Nick Piggin wrote: > > On Tue, May 12, 2009 at 02:42:02PM +0200, Jens Axboe wrote: > > > On Mon, May 11 2009, Andrew Morton wrote: > > > > On Sat, 9 May 2009 15:46:39 -0700 (PDT) > > > > David Rientjes <rientjes@google.com> wrote: > > > > > > > > > The oom killer must be invoked regardless of the order if the allocation > > > > > is __GFP_NOFAIL, otherwise it will loop forever when reclaim fails to > > > > > free some memory. > > > > > > > > Sigh. We're supposed to be deleting __GFP_NOFAIL. I added it as a way > > > > of easily finding lame error-handling-challenged callers which need to > > > > be fixed up. So of course we went and added lots more callers. > > > > > > > > y:/usr/src/linux-2.6.30-rc5> grep -rl GFP_NOFAIL . > > > > ./fs/bio-integrity.c > > > > This is no good either, it seems to be in the bio submission path. > > > > It needs a mempool or something. > > mempool cannot help here, since the allocation is tied to the process > (and IO) life time. Oh, I was talking about bio-integrity.c... > > It has a dead code "fallback" that returns an error, but I suspect that's > > not really acceptable. > > It's not that difficult to handle an error there, it just means that we > lose any process association with that request. It's mostly making sure > that all the bits and pieces deal with that correctly, but it should not > be very hard. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL 2009-05-12 16:49 ` Nick Piggin @ 2009-05-12 17:35 ` Jens Axboe 0 siblings, 0 replies; 20+ messages in thread From: Jens Axboe @ 2009-05-12 17:35 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, David Rientjes, mel, a.p.zijlstra, cl, dave, linux-kernel, Divy Le Ray On Tue, May 12 2009, Nick Piggin wrote: > On Tue, May 12, 2009 at 06:37:30PM +0200, Jens Axboe wrote: > > On Tue, May 12 2009, Nick Piggin wrote: > > > On Tue, May 12, 2009 at 02:42:02PM +0200, Jens Axboe wrote: > > > > On Mon, May 11 2009, Andrew Morton wrote: > > > > > On Sat, 9 May 2009 15:46:39 -0700 (PDT) > > > > > David Rientjes <rientjes@google.com> wrote: > > > > > > > > > > > The oom killer must be invoked regardless of the order if the allocation > > > > > > is __GFP_NOFAIL, otherwise it will loop forever when reclaim fails to > > > > > > free some memory. > > > > > > > > > > Sigh. We're supposed to be deleting __GFP_NOFAIL. I added it as a way > > > > > of easily finding lame error-handling-challenged callers which need to > > > > > be fixed up. So of course we went and added lots more callers. > > > > > > > > > > y:/usr/src/linux-2.6.30-rc5> grep -rl GFP_NOFAIL . > > > > > ./fs/bio-integrity.c > > > > > > This is no good either, it seems to be in the bio submission path. > > > > > > It needs a mempool or something. > > > > mempool cannot help here, since the allocation is tied to the process > > (and IO) life time. > > Oh, I was talking about bio-integrity.c... Oops my bad, I was on the (apparently) cfq case. I agree on the integrity path, a mempool should work fine for that. -- Jens Axboe ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2009-05-12 17:35 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-05-09 22:46 [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL David Rientjes 2009-05-10 23:42 ` Rik van Riel 2009-05-11 7:29 ` Minchan Kim 2009-05-11 8:40 ` David Rientjes 2009-05-11 9:12 ` Minchan Kim 2009-05-11 11:21 ` Minchan Kim 2009-05-11 13:38 ` Mel Gorman 2009-05-11 14:00 ` Minchan Kim 2009-05-11 19:32 ` David Rientjes 2009-05-11 23:48 ` Minchan Kim 2009-05-11 21:32 ` Mel Gorman 2009-05-11 23:41 ` Minchan Kim 2009-05-11 10:40 ` Mel Gorman 2009-05-11 19:37 ` David Rientjes 2009-05-11 20:40 ` Andrew Morton 2009-05-12 12:42 ` Jens Axboe 2009-05-12 13:05 ` Nick Piggin 2009-05-12 16:37 ` Jens Axboe 2009-05-12 16:49 ` Nick Piggin 2009-05-12 17:35 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox