[patch] mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [patch] mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator
@ 2011-01-23 22:58 David Rientjes
  2011-01-24 17:09 ` Rik van Riel
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: David Rientjes @ 2011-01-23 22:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Johannes Weiner, Minchan Kim, Wu Fengguang,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Jens Axboe,
	linux-mm

0e093d99763e (writeback: do not sleep on the congestion queue if there
are no congested BDIs or if significant congestion is not being
encountered in the current zone) uncovered a livelock in the page
allocator that resulted in tasks infinitely looping trying to find memory
and kswapd running at 100% cpu.

The issue occurs because drain_all_pages() is called immediately
following direct reclaim when no memory is freed and try_to_free_pages()
returns non-zero because all zones in the zonelist do not have their
all_unreclaimable flag set.

When draining the per-cpu pagesets back to the buddy allocator for each
zone, the zone->pages_scanned counter is cleared to avoid erroneously
setting zone->all_unreclaimable later.  The problem is that no pages may
actually be drained and, thus, the unreclaimable logic never fails direct
reclaim so the oom killer may be invoked.

This apparently only manifested after wait_iff_congested() was introduced
and the zone was full of anonymous memory that would not congest the
backing store.  The page allocator would infinitely loop if there were no
other tasks waiting to be scheduled and clear zone->pages_scanned because
of drain_all_pages() as the result of this change before kswapd could
scan enough pages to trigger the reclaim logic.  Additionally, with every
loop of the page allocator and in the reclaim path, kswapd would be
kicked and would end up running at 100% cpu.  In this scenario, current
and kswapd are all running continuously with kswapd incrementing
zone->pages_scanned and current clearing it.

The problem is even more pronounced when current swaps some of its memory
to swap cache and the reclaimable logic then considers all active
anonymous memory in the all_unreclaimable logic, which requires a much
higher zone->pages_scanned value for try_to_free_pages() to return zero
that is never attainable in this scenario.

Before wait_iff_congested(), the page allocator would incur an
unconditional timeout and allow kswapd to elevate zone->pages_scanned to
a level that the oom killer would be called the next time it loops.

The fix is to only attempt to drain pcp pages if there is actually a
quantity to be drained.  The unconditional clearing of
zone->pages_scanned in free_pcppages_bulk() need not be changed since
other callers already ensure that draining will occur.  This patch
ensures that free_pcppages_bulk() will actually free memory before
calling into it from drain_all_pages() so zone->pages_scanned is only
cleared if appropriate.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/page_alloc.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1088,8 +1088,10 @@ static void drain_pages(unsigned int cpu)
 		pset = per_cpu_ptr(zone->pageset, cpu);

 		pcp = &pset->pcp;
-		free_pcppages_bulk(zone, pcp->count, pcp);
-		pcp->count = 0;
+		if (pcp->count) {
+			free_pcppages_bulk(zone, pcp->count, pcp);
+			pcp->count = 0;
+		}
 		local_irq_restore(flags);
 	}
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator
  2011-01-23 22:58 [patch] mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator David Rientjes
@ 2011-01-24 17:09 ` Rik van Riel
  2011-01-25  8:42 ` Johannes Weiner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Rik van Riel @ 2011-01-24 17:09 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Mel Gorman, Johannes Weiner, Minchan Kim,
	Wu Fengguang, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Jens Axboe,
	linux-mm

On 01/23/2011 05:58 PM, David Rientjes wrote:
> 0e093d99763e (writeback: do not sleep on the congestion queue if there
> are no congested BDIs or if significant congestion is not being
> encountered in the current zone) uncovered a livelock in the page
> allocator that resulted in tasks infinitely looping trying to find memory
> and kswapd running at 100% cpu.
>
> The issue occurs because drain_all_pages() is called immediately
> following direct reclaim when no memory is freed and try_to_free_pages()
> returns non-zero because all zones in the zonelist do not have their
> all_unreclaimable flag set.

Reviewed-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator
  2011-01-23 22:58 [patch] mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator David Rientjes
  2011-01-24 17:09 ` Rik van Riel
@ 2011-01-25  8:42 ` Johannes Weiner
  2011-01-26  8:51 ` Mel Gorman
  2011-01-30  2:10 ` Minchan Kim
  3 siblings, 0 replies; 5+ messages in thread
From: Johannes Weiner @ 2011-01-25  8:42 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Mel Gorman, Minchan Kim, Wu Fengguang,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Jens Axboe,
	linux-mm

On Sun, Jan 23, 2011 at 02:58:39PM -0800, David Rientjes wrote:
> 0e093d99763e (writeback: do not sleep on the congestion queue if there
> are no congested BDIs or if significant congestion is not being
> encountered in the current zone) uncovered a livelock in the page
> allocator that resulted in tasks infinitely looping trying to find memory
> and kswapd running at 100% cpu.
> 
> The issue occurs because drain_all_pages() is called immediately
> following direct reclaim when no memory is freed and try_to_free_pages()
> returns non-zero because all zones in the zonelist do not have their
> all_unreclaimable flag set.
> 
> When draining the per-cpu pagesets back to the buddy allocator for each
> zone, the zone->pages_scanned counter is cleared to avoid erroneously
> setting zone->all_unreclaimable later.  The problem is that no pages may
> actually be drained and, thus, the unreclaimable logic never fails direct
> reclaim so the oom killer may be invoked.
> 
> This apparently only manifested after wait_iff_congested() was introduced
> and the zone was full of anonymous memory that would not congest the
> backing store.  The page allocator would infinitely loop if there were no
> other tasks waiting to be scheduled and clear zone->pages_scanned because
> of drain_all_pages() as the result of this change before kswapd could
> scan enough pages to trigger the reclaim logic.  Additionally, with every
> loop of the page allocator and in the reclaim path, kswapd would be
> kicked and would end up running at 100% cpu.  In this scenario, current
> and kswapd are all running continuously with kswapd incrementing
> zone->pages_scanned and current clearing it.
> 
> The problem is even more pronounced when current swaps some of its memory
> to swap cache and the reclaimable logic then considers all active
> anonymous memory in the all_unreclaimable logic, which requires a much
> higher zone->pages_scanned value for try_to_free_pages() to return zero
> that is never attainable in this scenario.
> 
> Before wait_iff_congested(), the page allocator would incur an
> unconditional timeout and allow kswapd to elevate zone->pages_scanned to
> a level that the oom killer would be called the next time it loops.
> 
> The fix is to only attempt to drain pcp pages if there is actually a
> quantity to be drained.  The unconditional clearing of
> zone->pages_scanned in free_pcppages_bulk() need not be changed since
> other callers already ensure that draining will occur.  This patch
> ensures that free_pcppages_bulk() will actually free memory before
> calling into it from drain_all_pages() so zone->pages_scanned is only
> cleared if appropriate.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator
  2011-01-23 22:58 [patch] mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator David Rientjes
  2011-01-24 17:09 ` Rik van Riel
  2011-01-25  8:42 ` Johannes Weiner
@ 2011-01-26  8:51 ` Mel Gorman
  2011-01-30  2:10 ` Minchan Kim
  3 siblings, 0 replies; 5+ messages in thread
From: Mel Gorman @ 2011-01-26  8:51 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Johannes Weiner, Minchan Kim, Wu Fengguang,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Jens Axboe,
	linux-mm

On Sun, Jan 23, 2011 at 02:58:39PM -0800, David Rientjes wrote:
> 0e093d99763e (writeback: do not sleep on the congestion queue if there
> are no congested BDIs or if significant congestion is not being
> encountered in the current zone) uncovered a livelock in the page
> allocator that resulted in tasks infinitely looping trying to find memory
> and kswapd running at 100% cpu.
> 
> The issue occurs because drain_all_pages() is called immediately
> following direct reclaim when no memory is freed and try_to_free_pages()
> returns non-zero because all zones in the zonelist do not have their
> all_unreclaimable flag set.
> 
> When draining the per-cpu pagesets back to the buddy allocator for each
> zone, the zone->pages_scanned counter is cleared to avoid erroneously
> setting zone->all_unreclaimable later.  The problem is that no pages may
> actually be drained and, thus, the unreclaimable logic never fails direct
> reclaim so the oom killer may be invoked.
> 
> This apparently only manifested after wait_iff_congested() was introduced
> and the zone was full of anonymous memory that would not congest the
> backing store.  The page allocator would infinitely loop if there were no
> other tasks waiting to be scheduled and clear zone->pages_scanned because
> of drain_all_pages() as the result of this change before kswapd could
> scan enough pages to trigger the reclaim logic.  Additionally, with every
> loop of the page allocator and in the reclaim path, kswapd would be
> kicked and would end up running at 100% cpu.  In this scenario, current
> and kswapd are all running continuously with kswapd incrementing
> zone->pages_scanned and current clearing it.
> 
> The problem is even more pronounced when current swaps some of its memory
> to swap cache and the reclaimable logic then considers all active
> anonymous memory in the all_unreclaimable logic, which requires a much
> higher zone->pages_scanned value for try_to_free_pages() to return zero
> that is never attainable in this scenario.
> 
> Before wait_iff_congested(), the page allocator would incur an
> unconditional timeout and allow kswapd to elevate zone->pages_scanned to
> a level that the oom killer would be called the next time it loops.
> 
> The fix is to only attempt to drain pcp pages if there is actually a
> quantity to be drained.  The unconditional clearing of
> zone->pages_scanned in free_pcppages_bulk() need not be changed since
> other callers already ensure that draining will occur.  This patch
> ensures that free_pcppages_bulk() will actually free memory before
> calling into it from drain_all_pages() so zone->pages_scanned is only
> cleared if appropriate.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

Nice analysis and I cannot spot any flaw;

Reviewed-by: Mel Gorman <mel@csn.ul.ie>

> ---
>  mm/page_alloc.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1088,8 +1088,10 @@ static void drain_pages(unsigned int cpu)
>  		pset = per_cpu_ptr(zone->pageset, cpu);
>  
>  		pcp = &pset->pcp;
> -		free_pcppages_bulk(zone, pcp->count, pcp);
> -		pcp->count = 0;
> +		if (pcp->count) {
> +			free_pcppages_bulk(zone, pcp->count, pcp);
> +			pcp->count = 0;
> +		}
>  		local_irq_restore(flags);
>  	}
>  }
> 

-- 
Mel Gorman
Linux Technology Center
IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator
  2011-01-23 22:58 [patch] mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator David Rientjes
                   ` (2 preceding siblings ...)
  2011-01-26  8:51 ` Mel Gorman
@ 2011-01-30  2:10 ` Minchan Kim
  3 siblings, 0 replies; 5+ messages in thread
From: Minchan Kim @ 2011-01-30  2:10 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Mel Gorman, Johannes Weiner, Wu Fengguang,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Jens Axboe,
	linux-mm

On Mon, Jan 24, 2011 at 7:58 AM, David Rientjes <rientjes@google.com> wrote:
> 0e093d99763e (writeback: do not sleep on the congestion queue if there
> are no congested BDIs or if significant congestion is not being
> encountered in the current zone) uncovered a livelock in the page
> allocator that resulted in tasks infinitely looping trying to find memory
> and kswapd running at 100% cpu.
>
> The issue occurs because drain_all_pages() is called immediately
> following direct reclaim when no memory is freed and try_to_free_pages()
> returns non-zero because all zones in the zonelist do not have their
> all_unreclaimable flag set.
>
> When draining the per-cpu pagesets back to the buddy allocator for each
> zone, the zone->pages_scanned counter is cleared to avoid erroneously
> setting zone->all_unreclaimable later.  The problem is that no pages may
> actually be drained and, thus, the unreclaimable logic never fails direct
> reclaim so the oom killer may be invoked.
>
> This apparently only manifested after wait_iff_congested() was introduced
> and the zone was full of anonymous memory that would not congest the
> backing store.  The page allocator would infinitely loop if there were no
> other tasks waiting to be scheduled and clear zone->pages_scanned because
> of drain_all_pages() as the result of this change before kswapd could
> scan enough pages to trigger the reclaim logic.  Additionally, with every
> loop of the page allocator and in the reclaim path, kswapd would be
> kicked and would end up running at 100% cpu.  In this scenario, current
> and kswapd are all running continuously with kswapd incrementing
> zone->pages_scanned and current clearing it.
>
> The problem is even more pronounced when current swaps some of its memory
> to swap cache and the reclaimable logic then considers all active
> anonymous memory in the all_unreclaimable logic, which requires a much
> higher zone->pages_scanned value for try_to_free_pages() to return zero
> that is never attainable in this scenario.
>
> Before wait_iff_congested(), the page allocator would incur an
> unconditional timeout and allow kswapd to elevate zone->pages_scanned to
> a level that the oom killer would be called the next time it loops.
>
> The fix is to only attempt to drain pcp pages if there is actually a
> quantity to be drained.  The unconditional clearing of
> zone->pages_scanned in free_pcppages_bulk() need not be changed since
> other callers already ensure that draining will occur.  This patch
> ensures that free_pcppages_bulk() will actually free memory before
> calling into it from drain_all_pages() so zone->pages_scanned is only
> cleared if appropriate.
>
> Signed-off-by: David Rientjes <rientjes@google.com>

Good catch!!!!
Too late but,

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-01-30  2:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-23 22:58 [patch] mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator David Rientjes
2011-01-24 17:09 ` Rik van Riel
2011-01-25  8:42 ` Johannes Weiner
2011-01-26  8:51 ` Mel Gorman
2011-01-30  2:10 ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).