Re: [PATCH] mm: avoid livelock on !__GFP_FS allocations

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@linux-foundation.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Colin Cross <ccross@android.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	David Rientjes <rientjes@google.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH] mm: avoid livelock on !__GFP_FS allocations
Date: Mon, 14 Nov 2011 15:03:26 -0800	[thread overview]
Message-ID: <20111114150326.0ee60107.akpm@linux-foundation.org> (raw)
In-Reply-To: <20111114140421.GA27150@suse.de>

On Mon, 14 Nov 2011 14:04:21 +0000
Mel Gorman <mgorman@suse.de> wrote:

> This patch seems to have gotten lost in the cracks and the discussion
> on alternatives that started here https://lkml.org/lkml/2011/10/25/24
> petered out without any alternative patches being posted. Lacking
> a viable alternative patch, I'm reposting this patch because AFAIK,
> this bug still exists.
> 
> Colin Cross reported;
> 
>   Under the following conditions, __alloc_pages_slowpath can loop forever:
>   gfp_mask & __GFP_WAIT is true
>   gfp_mask & __GFP_FS is false
>   reclaim and compaction make no progress
>   order <= PAGE_ALLOC_COSTLY_ORDER
> 
>   These conditions happen very often during suspend and resume,
>   when pm_restrict_gfp_mask() effectively converts all GFP_KERNEL
>   allocations into __GFP_WAIT.
> 
>   The oom killer is not run because gfp_mask & __GFP_FS is false,
>   but should_alloc_retry will always return true when order is less
>   than PAGE_ALLOC_COSTLY_ORDER.
> 
> In his fix, he avoided retrying the allocation if reclaim made no
> progress and __GFP_FS was not set. The problem is that this would
> result in GFP_NOIO allocations failing that previously succeeded
> which would be very unfortunate.
> 
> The big difference between GFP_NOIO and suspend converting GFP_KERNEL
> to behave like GFP_NOIO is that normally flushers will be cleaning
> pages and kswapd reclaims pages allowing GFP_NOIO to succeed after
> a short delay. The same does not necessarily apply during suspend as
> the storage device may be suspended.  Hence, this patch special cases
> the suspend case to fail the page allocation if reclaim cannot make
> progress. This might cause suspend to abort but that is better than
> a livelock.

Fair enough.

> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9dd443d..5402897 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -127,6 +127,20 @@ void pm_restrict_gfp_mask(void)
>  	saved_gfp_mask = gfp_allowed_mask;
>  	gfp_allowed_mask &= ~GFP_IOFS;
>  }
> +
> +static bool pm_suspending(void)
> +{
> +	if ((gfp_allowed_mask & GFP_IOFS) == GFP_IOFS)
> +		return false;
> +	return true;
> +}

This doesn't seem a terribly reliable way of detecting that PM has
disabled the storage devices (which is what we really want to know
here: kswapd got crippled).

I guess it's safe for now, because PM is the only caller who alters
gfp_allowed_mask (I assume).  But an explicit storage_is_unavaliable
global which is set and cleared at exactly the correct time is clearer,
more direct and future-safer, no?

> +#else
> +
> +static bool pm_suspending(void)
> +{
> +	return false;
> +}
>  #endif /* CONFIG_PM_SLEEP */
>  
>  #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
> @@ -2214,6 +2228,14 @@ rebalance:
>  
>  			goto restart;
>  		}
> +
> +		/*
> +		 * Suspend converts GFP_KERNEL to __GFP_WAIT which can
> +		 * prevent reclaim making forward progress without
> +		 * invoking OOM. Bail if we are suspending
> +		 */
> +		if (pm_suspending())
> +			goto nopage;

The comment doesn't tell the whole story: it's important that kswapd
writeout was disabled?

--- a/mm/page_alloc.c~mm-avoid-livelock-on-__gfp_fs-allocations-fix
+++ a/mm/page_alloc.c
@@ -2263,9 +2263,10 @@ rebalance:
 		}
 
 		/*
-		 * Suspend converts GFP_KERNEL to __GFP_WAIT which can
-		 * prevent reclaim making forward progress without
-		 * invoking OOM. Bail if we are suspending
+		 * Suspend converts GFP_KERNEL to __GFP_WAIT which can prevent
+		 * reclaim making forward progress without invoking OOM.
+		 * Suspend also disables storage devices so kswapd cannot save
+		 * us.  Bail if we are suspending.
 		 */
 		if (pm_suspending())
 			goto nopage;
_

WARNING: multiple messages have this Message-ID (diff)

From: Andrew Morton <akpm@linux-foundation.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Colin Cross <ccross@android.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	David Rientjes <rientjes@google.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH] mm: avoid livelock on !__GFP_FS allocations
Date: Mon, 14 Nov 2011 15:03:26 -0800	[thread overview]
Message-ID: <20111114150326.0ee60107.akpm@linux-foundation.org> (raw)
In-Reply-To: <20111114140421.GA27150@suse.de>

On Mon, 14 Nov 2011 14:04:21 +0000
Mel Gorman <mgorman@suse.de> wrote:

> This patch seems to have gotten lost in the cracks and the discussion
> on alternatives that started here https://lkml.org/lkml/2011/10/25/24
> petered out without any alternative patches being posted. Lacking
> a viable alternative patch, I'm reposting this patch because AFAIK,
> this bug still exists.
> 
> Colin Cross reported;
> 
>   Under the following conditions, __alloc_pages_slowpath can loop forever:
>   gfp_mask & __GFP_WAIT is true
>   gfp_mask & __GFP_FS is false
>   reclaim and compaction make no progress
>   order <= PAGE_ALLOC_COSTLY_ORDER
> 
>   These conditions happen very often during suspend and resume,
>   when pm_restrict_gfp_mask() effectively converts all GFP_KERNEL
>   allocations into __GFP_WAIT.
> 
>   The oom killer is not run because gfp_mask & __GFP_FS is false,
>   but should_alloc_retry will always return true when order is less
>   than PAGE_ALLOC_COSTLY_ORDER.
> 
> In his fix, he avoided retrying the allocation if reclaim made no
> progress and __GFP_FS was not set. The problem is that this would
> result in GFP_NOIO allocations failing that previously succeeded
> which would be very unfortunate.
> 
> The big difference between GFP_NOIO and suspend converting GFP_KERNEL
> to behave like GFP_NOIO is that normally flushers will be cleaning
> pages and kswapd reclaims pages allowing GFP_NOIO to succeed after
> a short delay. The same does not necessarily apply during suspend as
> the storage device may be suspended.  Hence, this patch special cases
> the suspend case to fail the page allocation if reclaim cannot make
> progress. This might cause suspend to abort but that is better than
> a livelock.

Fair enough.

> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9dd443d..5402897 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -127,6 +127,20 @@ void pm_restrict_gfp_mask(void)
>  	saved_gfp_mask = gfp_allowed_mask;
>  	gfp_allowed_mask &= ~GFP_IOFS;
>  }
> +
> +static bool pm_suspending(void)
> +{
> +	if ((gfp_allowed_mask & GFP_IOFS) == GFP_IOFS)
> +		return false;
> +	return true;
> +}

This doesn't seem a terribly reliable way of detecting that PM has
disabled the storage devices (which is what we really want to know
here: kswapd got crippled).

I guess it's safe for now, because PM is the only caller who alters
gfp_allowed_mask (I assume).  But an explicit storage_is_unavaliable
global which is set and cleared at exactly the correct time is clearer,
more direct and future-safer, no?

> +#else
> +
> +static bool pm_suspending(void)
> +{
> +	return false;
> +}
>  #endif /* CONFIG_PM_SLEEP */
>  
>  #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
> @@ -2214,6 +2228,14 @@ rebalance:
>  
>  			goto restart;
>  		}
> +
> +		/*
> +		 * Suspend converts GFP_KERNEL to __GFP_WAIT which can
> +		 * prevent reclaim making forward progress without
> +		 * invoking OOM. Bail if we are suspending
> +		 */
> +		if (pm_suspending())
> +			goto nopage;

The comment doesn't tell the whole story: it's important that kswapd
writeout was disabled?

--- a/mm/page_alloc.c~mm-avoid-livelock-on-__gfp_fs-allocations-fix
+++ a/mm/page_alloc.c
@@ -2263,9 +2263,10 @@ rebalance:
 		}
 
 		/*
-		 * Suspend converts GFP_KERNEL to __GFP_WAIT which can
-		 * prevent reclaim making forward progress without
-		 * invoking OOM. Bail if we are suspending
+		 * Suspend converts GFP_KERNEL to __GFP_WAIT which can prevent
+		 * reclaim making forward progress without invoking OOM.
+		 * Suspend also disables storage devices so kswapd cannot save
+		 * us.  Bail if we are suspending.
 		 */
 		if (pm_suspending())
 			goto nopage;
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-11-14 23:03 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-14 14:04 [PATCH] mm: avoid livelock on !__GFP_FS allocations Mel Gorman
2011-11-14 14:04 ` Mel Gorman
2011-11-14 18:38 ` Andrea Arcangeli
2011-11-14 18:38   ` Andrea Arcangeli
2011-11-15 10:30   ` Mel Gorman
2011-11-15 10:30     ` Mel Gorman
2011-11-14 23:03 ` Andrew Morton [this message]
2011-11-14 23:03   ` Andrew Morton
2011-11-15 10:42   ` Mel Gorman
2011-11-15 10:42     ` Mel Gorman
2011-11-15 15:43     ` Mel Gorman
2011-11-15 15:43       ` Mel Gorman
2011-11-15 16:13 ` Minchan Kim
2011-11-15 16:13   ` Minchan Kim
2011-11-15 17:36   ` Mel Gorman
2011-11-15 17:36     ` Mel Gorman
2011-11-16  0:22     ` Minchan Kim
2011-11-16  0:22       ` Minchan Kim
2011-11-16  0:28       ` Colin Cross
2011-11-16  0:28         ` Colin Cross
2011-11-16  0:45         ` Minchan Kim
2011-11-16  0:45           ` Minchan Kim
2011-11-16  7:10           ` Pekka Enberg
2011-11-16  7:10             ` Pekka Enberg
2011-11-16 21:44             ` David Rientjes
2011-11-16 21:44               ` David Rientjes
2011-11-16 21:58               ` Rafael J. Wysocki
2011-11-16 21:58                 ` Rafael J. Wysocki
2011-11-16 22:07               ` Minchan Kim
2011-11-16 22:07                 ` Minchan Kim
2011-11-16 22:48                 ` David Rientjes
2011-11-16 22:48                   ` David Rientjes
2011-11-15 21:40 ` David Rientjes
2011-11-15 21:40   ` David Rientjes
2011-11-16  9:52   ` Mel Gorman
2011-11-16  9:52     ` Mel Gorman
2011-11-16 21:39     ` David Rientjes
2011-11-16 21:39       ` David Rientjes
  -- strict thread matches above, loose matches on Subject: below --
2011-10-25  6:39 Colin Cross
2011-10-25  6:39 ` Colin Cross
2011-10-25  7:40 ` Pekka Enberg
2011-10-25  7:40   ` Pekka Enberg
2011-10-25  7:51   ` Colin Cross
2011-10-25  7:51     ` Colin Cross
2011-10-25  8:08     ` Pekka Enberg
2011-10-25  8:08       ` Pekka Enberg
2011-10-25 22:12     ` David Rientjes
2011-10-25 22:12       ` David Rientjes
2011-10-25  9:09 ` Mel Gorman
2011-10-25  9:09   ` Mel Gorman
2011-10-25  9:26   ` Colin Cross
2011-10-25  9:26     ` Colin Cross
2011-10-25 11:23     ` Mel Gorman
2011-10-25 11:23       ` Mel Gorman
2011-10-25 17:08       ` Colin Cross
2011-10-25 17:08         ` Colin Cross
2011-11-01 12:28         ` Mel Gorman
2011-11-01 12:28           ` Mel Gorman
2011-10-25 19:39       ` Pekka Enberg
2011-10-25 19:39         ` Pekka Enberg
2011-11-01 12:29         ` Mel Gorman
2011-11-01 12:29           ` Mel Gorman
2011-10-25 19:29   ` Colin Cross
2011-10-25 19:29     ` Colin Cross
2011-10-25 22:18   ` David Rientjes
2011-10-25 22:18     ` David Rientjes
2011-10-26  1:46     ` Colin Cross
2011-10-26  1:46       ` Colin Cross
2011-10-26  5:47       ` David Rientjes
2011-10-26  5:47         ` David Rientjes
2011-10-26  6:12         ` David Rientjes
2011-10-26  6:12           ` David Rientjes
2011-10-26  6:16           ` Colin Cross
2011-10-26  6:16             ` Colin Cross
2011-10-26  6:24             ` David Rientjes
2011-10-26  6:24               ` David Rientjes
2011-10-26  6:26               ` Colin Cross
2011-10-26  6:26                 ` Colin Cross
2011-10-26  6:33                 ` David Rientjes
2011-10-26  6:33                   ` David Rientjes
2011-10-26  6:36                   ` Colin Cross
2011-10-26  6:36                     ` Colin Cross
2011-10-26  6:51                     ` David Rientjes
2011-10-26  6:51                       ` David Rientjes
2011-10-26  6:57                       ` Colin Cross
2011-10-26  6:57                         ` Colin Cross
2011-10-26  7:10                         ` David Rientjes
2011-10-26  7:10                           ` David Rientjes
2011-10-26  7:22                           ` Colin Cross
2011-10-26  7:22                             ` Colin Cross
2011-11-01 12:36                             ` Mel Gorman
2011-11-01 12:36                               ` Mel Gorman
2011-10-25 22:10 ` David Rientjes
2011-10-25 22:10   ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111114150326.0ee60107.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=aarcange@redhat.com \
    --cc=ccross@android.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.