All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Chinner <david@fromorbit.com>, Mel Gorman <mel@csn.ul.ie>,
	Linux Kernel List <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails
Date: Sat, 4 Sep 2010 12:37:12 +0800	[thread overview]
Message-ID: <20100904043712.GA17217@localhost> (raw)
In-Reply-To: <20100903205945.44e1aa38.akpm@linux-foundation.org>

On Sat, Sep 04, 2010 at 11:59:45AM +0800, Andrew Morton wrote:
> On Sat, 4 Sep 2010 11:23:11 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > > Still, given the improvements in performance from this patchset,
> > > I'd say inclusion is a no-braniner....
> > 
> > In your case it's not really high memory pressure, but maybe too many
> > concurrent direct reclaimers, so that when one reclaimed some free
> > pages, others kick in and "steal" the free pages. So we need to kill
> > the second cond_resched() call (which effectively gives other tasks a
> > good chance to steal this task's vmscan fruits), and only do
> > drain_all_pages() when nothing was reclaimed (instead of allocated).
> 
> Well...  cond_resched() will only resched when this task has been
> marked for preemption.  If that's happening at such a high frequency
> then Something Is Up with the scheduler, and the reported context
> switch rate will be high.

Yes it may not necessarily schedule away. But if ever this happens,
the task will likely run into drain_all_pages() when re-gain CPU.
Because the drain_all_pages() cost is very high, it don't need too
many reschedules to create the IPI storm..

> > Dave, will you give a try of this patch? It's based on Mel's.
> > 
> > 
> > --- linux-next.orig/mm/page_alloc.c	2010-09-04 11:08:03.000000000 +0800
> > +++ linux-next/mm/page_alloc.c	2010-09-04 11:16:33.000000000 +0800
> > @@ -1850,6 +1850,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
> >  
> >  	cond_resched();
> >  
> > +retry:
> >  	/* We now go into synchronous reclaim */
> >  	cpuset_memory_pressure_bump();
> >  	p->flags |= PF_MEMALLOC;
> > @@ -1863,26 +1864,23 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
> >  	lockdep_clear_current_reclaim_state();
> >  	p->flags &= ~PF_MEMALLOC;
> >  
> > -	cond_resched();
> > -
> > -	if (unlikely(!(*did_some_progress)))
> > +	if (unlikely(!(*did_some_progress))) {
> > +		if (!drained) {
> > +			drain_all_pages();
> > +			drained = true;
> > +			goto retry;
> > +		}
> >  		return NULL;
> > +	}
> >  
> > -retry:
> >  	page = get_page_from_freelist(gfp_mask, nodemask, order,
> >  					zonelist, high_zoneidx,
> >  					alloc_flags, preferred_zone,
> >  					migratetype);
> >  
> > -	/*
> > -	 * If an allocation failed after direct reclaim, it could be because
> > -	 * pages are pinned on the per-cpu lists. Drain them and try again
> > -	 */
> > -	if (!page && !drained) {
> > -		drain_all_pages();
> > -		drained = true;
> > +	/* someone steal our vmscan fruits? */
> > +	if (!page && *did_some_progress)
> >  		goto retry;
> > -	}
> 
> Perhaps the fruit-stealing event is worth adding to the
> userspace-exposed vm stats somewhere.  But not in /proc - somewhere
> more temporary, in debugfs.

There are no existing debugfs interfaces for vm stats, and I need to
go out right now.. So I did the following quick (and temporary) hack
to allow Dave to collect the information. Will revisit the proper
interface to use later :)

Thanks,
Fengguang
---
 include/linux/mmzone.h |    1 +
 mm/page_alloc.c        |    4 +++-
 mm/vmstat.c            |    1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

--- linux-next.orig/include/linux/mmzone.h	2010-09-04 12:30:26.000000000 +0800
+++ linux-next/include/linux/mmzone.h	2010-09-04 12:30:36.000000000 +0800
@@ -104,6 +104,7 @@ enum zone_stat_item {
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
 	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
+	NR_RECLAIM_STEAL,
 #ifdef CONFIG_NUMA
 	NUMA_HIT,		/* allocated in intended node */
 	NUMA_MISS,		/* allocated in non intended node */
--- linux-next.orig/mm/page_alloc.c	2010-09-04 12:28:09.000000000 +0800
+++ linux-next/mm/page_alloc.c	2010-09-04 12:33:39.000000000 +0800
@@ -1879,8 +1879,10 @@ retry:
 					migratetype);
 
 	/* someone steal our vmscan fruits? */
-	if (!page && *did_some_progress)
+	if (!page && *did_some_progress) {
+		inc_zone_state(preferred_zone, NR_RECLAIM_STEAL);
 		goto retry;
+	}
 
 	return page;
 }
--- linux-next.orig/mm/vmstat.c	2010-09-04 12:31:30.000000000 +0800
+++ linux-next/mm/vmstat.c	2010-09-04 12:31:42.000000000 +0800
@@ -732,6 +732,7 @@ static const char * const vmstat_text[] 
 	"nr_isolated_anon",
 	"nr_isolated_file",
 	"nr_shmem",
+	"nr_reclaim_steal",
 #ifdef CONFIG_NUMA
 	"numa_hit",
 	"numa_miss",

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Chinner <david@fromorbit.com>, Mel Gorman <mel@csn.ul.ie>,
	Linux Kernel List <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails
Date: Sat, 4 Sep 2010 12:37:12 +0800	[thread overview]
Message-ID: <20100904043712.GA17217@localhost> (raw)
In-Reply-To: <20100903205945.44e1aa38.akpm@linux-foundation.org>

On Sat, Sep 04, 2010 at 11:59:45AM +0800, Andrew Morton wrote:
> On Sat, 4 Sep 2010 11:23:11 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > > Still, given the improvements in performance from this patchset,
> > > I'd say inclusion is a no-braniner....
> > 
> > In your case it's not really high memory pressure, but maybe too many
> > concurrent direct reclaimers, so that when one reclaimed some free
> > pages, others kick in and "steal" the free pages. So we need to kill
> > the second cond_resched() call (which effectively gives other tasks a
> > good chance to steal this task's vmscan fruits), and only do
> > drain_all_pages() when nothing was reclaimed (instead of allocated).
> 
> Well...  cond_resched() will only resched when this task has been
> marked for preemption.  If that's happening at such a high frequency
> then Something Is Up with the scheduler, and the reported context
> switch rate will be high.

Yes it may not necessarily schedule away. But if ever this happens,
the task will likely run into drain_all_pages() when re-gain CPU.
Because the drain_all_pages() cost is very high, it don't need too
many reschedules to create the IPI storm..

> > Dave, will you give a try of this patch? It's based on Mel's.
> > 
> > 
> > --- linux-next.orig/mm/page_alloc.c	2010-09-04 11:08:03.000000000 +0800
> > +++ linux-next/mm/page_alloc.c	2010-09-04 11:16:33.000000000 +0800
> > @@ -1850,6 +1850,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
> >  
> >  	cond_resched();
> >  
> > +retry:
> >  	/* We now go into synchronous reclaim */
> >  	cpuset_memory_pressure_bump();
> >  	p->flags |= PF_MEMALLOC;
> > @@ -1863,26 +1864,23 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
> >  	lockdep_clear_current_reclaim_state();
> >  	p->flags &= ~PF_MEMALLOC;
> >  
> > -	cond_resched();
> > -
> > -	if (unlikely(!(*did_some_progress)))
> > +	if (unlikely(!(*did_some_progress))) {
> > +		if (!drained) {
> > +			drain_all_pages();
> > +			drained = true;
> > +			goto retry;
> > +		}
> >  		return NULL;
> > +	}
> >  
> > -retry:
> >  	page = get_page_from_freelist(gfp_mask, nodemask, order,
> >  					zonelist, high_zoneidx,
> >  					alloc_flags, preferred_zone,
> >  					migratetype);
> >  
> > -	/*
> > -	 * If an allocation failed after direct reclaim, it could be because
> > -	 * pages are pinned on the per-cpu lists. Drain them and try again
> > -	 */
> > -	if (!page && !drained) {
> > -		drain_all_pages();
> > -		drained = true;
> > +	/* someone steal our vmscan fruits? */
> > +	if (!page && *did_some_progress)
> >  		goto retry;
> > -	}
> 
> Perhaps the fruit-stealing event is worth adding to the
> userspace-exposed vm stats somewhere.  But not in /proc - somewhere
> more temporary, in debugfs.

There are no existing debugfs interfaces for vm stats, and I need to
go out right now.. So I did the following quick (and temporary) hack
to allow Dave to collect the information. Will revisit the proper
interface to use later :)

Thanks,
Fengguang
---
 include/linux/mmzone.h |    1 +
 mm/page_alloc.c        |    4 +++-
 mm/vmstat.c            |    1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

--- linux-next.orig/include/linux/mmzone.h	2010-09-04 12:30:26.000000000 +0800
+++ linux-next/include/linux/mmzone.h	2010-09-04 12:30:36.000000000 +0800
@@ -104,6 +104,7 @@ enum zone_stat_item {
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
 	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
+	NR_RECLAIM_STEAL,
 #ifdef CONFIG_NUMA
 	NUMA_HIT,		/* allocated in intended node */
 	NUMA_MISS,		/* allocated in non intended node */
--- linux-next.orig/mm/page_alloc.c	2010-09-04 12:28:09.000000000 +0800
+++ linux-next/mm/page_alloc.c	2010-09-04 12:33:39.000000000 +0800
@@ -1879,8 +1879,10 @@ retry:
 					migratetype);
 
 	/* someone steal our vmscan fruits? */
-	if (!page && *did_some_progress)
+	if (!page && *did_some_progress) {
+		inc_zone_state(preferred_zone, NR_RECLAIM_STEAL);
 		goto retry;
+	}
 
 	return page;
 }
--- linux-next.orig/mm/vmstat.c	2010-09-04 12:31:30.000000000 +0800
+++ linux-next/mm/vmstat.c	2010-09-04 12:31:42.000000000 +0800
@@ -732,6 +732,7 @@ static const char * const vmstat_text[] 
 	"nr_isolated_anon",
 	"nr_isolated_file",
 	"nr_shmem",
+	"nr_reclaim_steal",
 #ifdef CONFIG_NUMA
 	"numa_hit",
 	"numa_miss",

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-09-04  4:37 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-03  9:08 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V4 Mel Gorman
2010-09-03  9:08 ` Mel Gorman
2010-09-03  9:08 ` [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list Mel Gorman
2010-09-03  9:08   ` Mel Gorman
2010-09-03 22:38   ` Andrew Morton
2010-09-03 22:38     ` Andrew Morton
2010-09-05 18:06     ` Mel Gorman
2010-09-05 18:06       ` Mel Gorman
2010-09-03  9:08 ` [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Mel Gorman
2010-09-03  9:08   ` Mel Gorman
2010-09-03 22:55   ` Andrew Morton
2010-09-03 22:55     ` Andrew Morton
2010-09-03 23:17     ` Christoph Lameter
2010-09-03 23:17       ` Christoph Lameter
2010-09-03 23:28       ` Andrew Morton
2010-09-03 23:28         ` Andrew Morton
2010-09-04  0:54         ` Christoph Lameter
2010-09-04  0:54           ` Christoph Lameter
2010-09-05 18:12     ` Mel Gorman
2010-09-05 18:12       ` Mel Gorman
2010-09-03  9:08 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-09-03  9:08   ` Mel Gorman
2010-09-03 23:00   ` Andrew Morton
2010-09-03 23:00     ` Andrew Morton
2010-09-04  2:25     ` Dave Chinner
2010-09-04  2:25       ` Dave Chinner
2010-09-04  3:21       ` Andrew Morton
2010-09-04  3:21         ` Andrew Morton
2010-09-04  7:58         ` Dave Chinner
2010-09-04  7:58           ` Dave Chinner
2010-09-04  8:14           ` Dave Chinner
2010-09-04  8:14             ` Dave Chinner
     [not found]             ` <20100905015400.GA10714@localhost>
     [not found]               ` <20100905021555.GG705@dastard>
     [not found]                 ` <20100905060539.GA17450@localhost>
     [not found]                   ` <20100905131447.GJ705@dastard>
2010-09-05 13:45                     ` Wu Fengguang
2010-09-05 13:45                       ` Wu Fengguang
2010-09-05 23:33                       ` Dave Chinner
2010-09-05 23:33                         ` Dave Chinner
2010-09-06  4:02                       ` Dave Chinner
2010-09-06  4:02                         ` Dave Chinner
2010-09-06  8:40                         ` Mel Gorman
2010-09-06  8:40                           ` Mel Gorman
2010-09-06 21:50                           ` Dave Chinner
2010-09-06 21:50                             ` Dave Chinner
2010-09-08  8:49                             ` Dave Chinner
2010-09-08  8:49                               ` Dave Chinner
2010-09-09 12:39                               ` Mel Gorman
2010-09-09 12:39                                 ` Mel Gorman
2010-09-10  6:17                                 ` Dave Chinner
2010-09-10  6:17                                   ` Dave Chinner
2010-09-07 14:23                         ` Christoph Lameter
2010-09-07 14:23                           ` Christoph Lameter
2010-09-08  2:13                           ` Wu Fengguang
2010-09-08  2:13                             ` Wu Fengguang
2010-09-04  3:23       ` Wu Fengguang
2010-09-04  3:23         ` Wu Fengguang
2010-09-04  3:59         ` Andrew Morton
2010-09-04  3:59           ` Andrew Morton
2010-09-04  4:37           ` Wu Fengguang [this message]
2010-09-04  4:37             ` Wu Fengguang
2010-09-05 18:22       ` Mel Gorman
2010-09-05 18:22         ` Mel Gorman
2010-09-05 18:14     ` Mel Gorman
2010-09-05 18:14       ` Mel Gorman
2010-09-08  7:43   ` KOSAKI Motohiro
2010-09-08  7:43     ` KOSAKI Motohiro
2010-09-08 20:05     ` Christoph Lameter
2010-09-08 20:05       ` Christoph Lameter
2010-09-09 12:41     ` Mel Gorman
2010-09-09 12:41       ` Mel Gorman
2010-09-09 13:45       ` Christoph Lameter
2010-09-09 13:45         ` Christoph Lameter
2010-09-09 13:55         ` Mel Gorman
2010-09-09 13:55           ` Mel Gorman
2010-09-09 14:32           ` Christoph Lameter
2010-09-09 14:32             ` Christoph Lameter
2010-09-09 15:05             ` Mel Gorman
2010-09-09 15:05               ` Mel Gorman
2010-09-10  2:56               ` KOSAKI Motohiro
2010-09-10  2:56                 ` KOSAKI Motohiro
2010-09-03 23:05 ` [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V4 Andrew Morton
2010-09-03 23:05   ` Andrew Morton
2010-09-21 11:17   ` Mel Gorman
2010-09-21 11:17     ` Mel Gorman
2010-09-21 12:58     ` [stable] " Greg KH
2010-09-21 12:58       ` Greg KH
2010-09-21 14:23       ` Mel Gorman
2010-09-21 14:23         ` Mel Gorman
2010-09-23 18:49         ` Greg KH
2010-09-23 18:49           ` Greg KH
2010-09-24  9:14           ` Mel Gorman
2010-09-24  9:14             ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2010-08-31 17:37 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V3 Mel Gorman
2010-08-31 17:37 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-08-31 17:37   ` Mel Gorman
2010-08-31 18:26   ` Christoph Lameter
2010-08-31 18:26     ` Christoph Lameter
2010-08-23  8:00 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V2 Mel Gorman
2010-08-23  8:00 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-08-23  8:00   ` Mel Gorman
2010-08-23 23:17   ` KOSAKI Motohiro
2010-08-23 23:17     ` KOSAKI Motohiro
2010-08-16  9:42 [RFC PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator Mel Gorman
2010-08-16  9:42 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-08-16 14:50   ` Rik van Riel
2010-08-17  2:57   ` Minchan Kim
2010-08-18  3:02   ` KAMEZAWA Hiroyuki
2010-08-19 14:47   ` Minchan Kim
2010-08-19 15:10     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100904043712.GA17217@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.