linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator
@ 2010-08-16  9:42 Mel Gorman
  2010-08-16  9:42 ` [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list Mel Gorman
                   ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Mel Gorman @ 2010-08-16  9:42 UTC (permalink / raw)
  To: linux-mm
  Cc: Rik van Riel, Nick Piggin, Johannes Weiner, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Mel Gorman

Internal IBM test teams beta testing distribution kernels have reported
problems on machines with a large number of CPUs whereby page allocator
failure messages show huge differences between the nr_free_pages vmstat
counter and what is available on the buddy lists. In an extreme example,
nr_free_pages was above the min watermark but zero pages were on the buddy
lists allowing the system to potentially deadlock. There is no reason why
the problems would not affect mainline so the following series mitigates the
problems in the page allocator related to to per-cpu counter drift and lists.

The first patch ensures that counters are updated after pages are added to
free lists.

The second patch notes that the counter drift between nr_free_pages and what
is on the per-cpu lists can be very high. When memory is low and kswapd
is awake, the per-cpu counters are checked as well as reading the value
of NR_FREE_PAGES. This will slow the page allocator when memory is low and
kswapd is awake but it will be much harder to breach the min watermark and
potentially livelock the system.

The third patch notes that after direct-reclaim an allocation can
fail because the necessary pages are on the per-cpu lists. After a
direct-reclaim-and-allocation-failure, the per-cpu lists are drained and
a second attempt is made.

Performance tests did not show up anything interesting. A version of this
series that continually called vmstat_update() when memory was low was
tested internally and found to help the counter drift problem. I described
this during LSF/MM Summit and the potential for IPI storms was frowned
upon. An alternative fix is in patch two which uses for_each_online_cpu()
to read the vmstat deltas while memory is low and kswapd is awake. This
should be functionally similar.

Comments?

 include/linux/mmzone.h |    9 +++++++++
 mm/mmzone.c            |   27 +++++++++++++++++++++++++++
 mm/page_alloc.c        |   28 ++++++++++++++++++++++------
 mm/vmstat.c            |    5 ++++-
 4 files changed, 62 insertions(+), 7 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2010-08-23  7:18 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-16  9:42 [RFC PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator Mel Gorman
2010-08-16  9:42 ` [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list Mel Gorman
2010-08-16 14:04   ` Rik van Riel
2010-08-16 15:26   ` Johannes Weiner
2010-08-17  2:21   ` Minchan Kim
2010-08-17  9:59     ` Mel Gorman
2010-08-17 14:25       ` Minchan Kim
2010-08-18  2:21   ` KAMEZAWA Hiroyuki
2010-08-16  9:42 ` [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Mel Gorman
2010-08-16  9:43   ` Mel Gorman
2010-08-16 14:47     ` Rik van Riel
2010-08-16 16:06     ` Johannes Weiner
2010-08-17  2:26       ` Minchan Kim
2010-08-17 10:42         ` Mel Gorman
2010-08-17 15:01           ` Minchan Kim
2010-08-17 15:05             ` Mel Gorman
2010-08-17 10:16       ` Mel Gorman
2010-08-17 11:05         ` Johannes Weiner
2010-08-17 14:20         ` Minchan Kim
2010-08-18  8:51           ` Mel Gorman
2010-08-18 14:57             ` Minchan Kim
2010-08-19  8:06               ` Mel Gorman
2010-08-19 10:33                 ` Minchan Kim
2010-08-19 10:38                   ` Mel Gorman
2010-08-19 14:01                     ` Minchan Kim
2010-08-19 14:09                       ` Mel Gorman
2010-08-19 14:34                         ` Minchan Kim
2010-08-19 15:07                           ` Mel Gorman
2010-08-19 15:22                             ` Minchan Kim
2010-08-19 15:40                               ` Mel Gorman
2010-08-19 15:44                                 ` Minchan Kim
2010-08-19 15:46     ` Minchan Kim
2010-08-19 16:06       ` Mel Gorman
2010-08-19 16:45         ` Minchan Kim
2010-08-18  2:59   ` KAMEZAWA Hiroyuki
2010-08-18 15:55     ` Christoph Lameter
2010-08-19  0:07       ` KAMEZAWA Hiroyuki
2010-08-19 19:00         ` Christoph Lameter
2010-08-19 23:49           ` KAMEZAWA Hiroyuki
2010-08-20  0:22             ` [PATCH] vmstat : update zone stat threshold at onlining a cpu KAMEZAWA Hiroyuki
2010-08-20 14:54               ` Christoph Lameter
2010-08-20 17:29                 ` Andrew Morton
2010-08-23  7:18               ` Mel Gorman
2010-08-16  9:42 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-08-16 14:50   ` Rik van Riel
2010-08-17  2:57   ` Minchan Kim
2010-08-18  3:02   ` KAMEZAWA Hiroyuki
2010-08-19 14:47   ` Minchan Kim
2010-08-19 15:10     ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).