All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	arjan@infradead.org, linux-kernel@vger.kernel.org,
	Christoph Lameter <cl@linux-foundation.org>,
	Nick Piggin <npiggin@suse.de>
Subject: Re: upcoming kerneloops.org item: get_page_from_freelist
Date: Tue, 30 Jun 2009 21:32:49 +0100	[thread overview]
Message-ID: <20090630203249.GC6689@csn.ul.ie> (raw)
In-Reply-To: <alpine.DEB.2.00.0906301216180.16312@chino.kir.corp.google.com>

On Tue, Jun 30, 2009 at 12:35:59PM -0700, David Rientjes wrote:
> On Tue, 30 Jun 2009, Mel Gorman wrote:
> 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 175a67a..5f4656e 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -230,14 +230,21 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> >  		/*
> >  		 * This task already has access to memory reserves and is
> >  		 * being killed. Don't allow any other task access to the
> > -		 * memory reserve.
> > +		 * memory reserve unless the current process is the one
> > +		 * selected for OOM-killing. If the current process has
> > +		 * been OOM-killed and we are OOM again, another process
> > +		 * needs to be considered for OOM-kill
> >  		 *
> >  		 * Note: this may have a chance of deadlock if it gets
> >  		 * blocked waiting for another task which itself is waiting
> >  		 * for memory. Is there a better alternative?
> >  		 */
> > -		if (test_tsk_thread_flag(p, TIF_MEMDIE))
> > -			return ERR_PTR(-1UL);
> > +		if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
> > +			if (p == current)
> > +				continue;
> > +			else
> > +				return ERR_PTR(-1UL);
> > +		}
> >  
> >  		/*
> >  		 * This is in the process of releasing memory so wait for it
> 
> This will panic the machine if current is the only user thread running or 
> eligible for oom kill (all others could have mm's with OOM_DISABLE set).  
> Currently, such tasks can exit or kthreads can free memory so that the oom 
> is recoverable.
> 

Good point, would the following be ok instead?

+           if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
+                   if (p == current) {
+			    chosen = ERR_PTR(-1UL);
+                           continue;
+                   } else
+                           return ERR_PTR(-1UL);


> The problem with this approach is that it increases the liklihood that 
> memory reserves will be totally depleted when several threads are 
> competing for them.
> 

How so? 

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5d714f8..5896469 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1539,6 +1539,14 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
> >  	if (gfp_mask & __GFP_NORETRY)
> >  		return 0;
> >  
> > +	/* Do not loop if OOM-killed unless __GFP_NOFAIL is specified */
> > +	if (test_thread_flag(TIF_MEMDIE)) {
> > +		if (gfp_mask & __GFP_NOFAIL)
> > +			WARN(1, "Potential infinite loop with __GFP_NOFAIL");
> > +		else
> > +			return 0;
> > +	}
> > +
> 
> There's a second bug in the refactored page allocator: when the oom killer 
> is invoked and it happens to kill current, alloc_flags is never updated 
> because it loops back to `restart', which is past gfp_to_alloc_flags().
> 
> When that is fixed with the patch below, gfp_mask & __GFP_NOFAIL will 
> never be true here in your scenario, where the oom killer kills current, 
> because __alloc_pages_high_priority() will infinitely loop.
> 
> >  	/*
> >  	 * In this implementation, order <= PAGE_ALLOC_COSTLY_ORDER
> >  	 * means __GFP_NOFAIL, but that may not be true in other
> > 
> 
> This is needed for 2.6.31-rc2.
> 
> 
> mm: update alloc_flags after oom killer has been called
> 
> It is possible for the oom killer to select current as the task to kill.  
> When this happens, alloc_flags needs to be updated accordingly to set 
> ALLOC_NO_WATERMARKS so the subsequent allocation attempt may use memory 
> reserves as the result of its thread having TIF_MEMDIE set if the 
> allocation is not __GFP_NOMEMALLOC.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

Acked-by: Mel Gorman <mel@csn.ul.ie>

> ---
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1756,6 +1756,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  
>  	wake_all_kswapd(order, zonelist, high_zoneidx);
>  
> +restart:
>  	/*
>  	 * OK, we're below the kswapd watermark and have kicked background
>  	 * reclaim. Now things get more complex, so set up alloc_flags according
> @@ -1763,7 +1764,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	 */
>  	alloc_flags = gfp_to_alloc_flags(gfp_mask);
>  
> -restart:
>  	/* This is the last chance, in general, before the goto nopage. */
>  	page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
>  			high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

  reply	other threads:[~2009-06-30 20:32 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-24 15:07 upcoming kerneloops.org item: get_page_from_freelist Arjan van de Ven
2009-06-24 16:46 ` Andrew Morton
2009-06-24 16:52   ` Linus Torvalds
2009-06-24 16:55   ` Pekka Enberg
2009-06-24 16:56     ` Pekka Enberg
2009-06-24 17:00       ` Pekka Enberg
2009-06-24 17:55     ` Andrew Morton
2009-06-24 17:53       ` Pekka Enberg
2009-06-24 18:30         ` Andrew Morton
2009-06-24 18:42           ` Linus Torvalds
2009-06-24 18:44             ` Pekka Enberg
2009-06-24 18:50               ` Linus Torvalds
2009-06-24 19:12                 ` Pekka J Enberg
2009-06-24 19:21                   ` Linus Torvalds
2009-06-24 19:06             ` Andrew Morton
2009-06-24 19:16               ` Linus Torvalds
2009-06-24 19:36                 ` Andrew Morton
2009-06-24 19:46                   ` Linus Torvalds
2009-06-24 19:47                     ` Linus Torvalds
2009-06-24 20:01                     ` Andrew Morton
2009-06-24 20:13                       ` Linus Torvalds
2009-06-24 20:40                         ` Linus Torvalds
2009-06-24 22:07                           ` Andrew Morton
2009-06-25  4:05                             ` Nick Piggin
2009-06-25 13:25                             ` Theodore Tso
2009-06-25 18:51                               ` David Rientjes
2009-06-25 19:38                                 ` Theodore Tso
2009-06-25 19:44                                   ` Theodore Tso
2009-06-25 19:55                                     ` Andrew Morton
2009-06-25 20:11                                     ` Linus Torvalds
2009-06-25 20:22                                       ` Linus Torvalds
2009-06-25 20:36                                         ` David Rientjes
2009-06-25 20:51                                           ` Linus Torvalds
2009-06-25 22:25                                             ` David Rientjes
2009-06-26  8:51                                         ` Nick Piggin
2009-06-25 20:18                                     ` David Rientjes
2009-06-25 20:37                                       ` Theodore Tso
2009-06-25 21:05                                         ` Joel Becker
2009-06-25 21:05                                           ` Joel Becker
2009-06-25 21:26                                         ` Andreas Dilger
2009-06-25 21:26                                           ` Andreas Dilger
2009-06-25 22:05                                           ` Theodore Tso
2009-06-25 22:11                                             ` Eric Sandeen
2009-06-25 22:11                                               ` Eric Sandeen
2009-06-26  1:11                                               ` Theodore Tso
2009-06-26  5:16                                                 ` Pekka J Enberg
2009-06-26  8:56                                                   ` Nick Piggin
2009-06-26  8:58                                                     ` Pekka Enberg
2009-06-26  9:07                                                       ` Nick Piggin
2009-06-29 21:06                                                       ` Christoph Lameter
2009-06-30  7:59                                                         ` Nick Piggin
2009-06-26 14:41                                                   ` Eric Sandeen
2009-06-29 21:15                                                     ` Christoph Lameter
2009-06-29 21:20                                                       ` Eric Sandeen
2009-06-29 22:35                                                         ` Christoph Lameter
2009-06-25 19:55                             ` Jens Axboe
2009-06-25 20:08                               ` Jens Axboe
2009-06-24 21:56                         ` Andrew Morton
2009-06-25  4:14                           ` Nick Piggin
2009-06-25  8:21                           ` David Rientjes
2009-06-29 15:30                           ` Mel Gorman
2009-06-29 19:20                             ` Andrew Morton
2009-06-30 11:00                               ` Mel Gorman
2009-06-30 19:35                                 ` David Rientjes
2009-06-30 20:32                                   ` Mel Gorman [this message]
2009-06-30 20:51                                     ` David Rientjes
2009-07-01 10:22                                       ` Mel Gorman
2009-06-29 23:35                             ` David Rientjes
2009-06-30  7:47                               ` Nick Piggin
2009-06-30  8:13                                 ` David Rientjes
2009-06-30  8:24                                   ` Nick Piggin
2009-06-30  8:41                                     ` David Rientjes
2009-06-30  9:09                                       ` Nick Piggin
2009-06-30 19:47                                         ` David Rientjes
2009-06-30  6:27                           ` Pavel Machek
2009-06-28 10:16                     ` Pavel Machek
2009-06-28 18:01                       ` Linus Torvalds
2009-06-28 18:27                         ` Arjan van de Ven
2009-06-28 18:36                           ` Linus Torvalds
2009-06-30  7:35                         ` Pavel Machek
2009-06-24 18:43           ` Pekka Enberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090630203249.GC6689@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=cl@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.