Re: zone state overhead

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Shaohua Li <shaohua.li@intel.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"cl@linux.com" <cl@linux.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: zone state overhead
Date: Wed, 27 Oct 2010 09:19:11 +0100	[thread overview]
Message-ID: <20101027081910.GI5383@csn.ul.ie> (raw)
In-Reply-To: <20101025132824.9176.A69D9226@jp.fujitsu.com>

On Mon, Oct 25, 2010 at 01:46:19PM +0900, KOSAKI Motohiro wrote:
> > - * Return 1 if free pages are above 'mark'. This takes into account the order
> > + * Return true if free pages are above 'mark'. This takes into account the order
> >   * of the allocation.
> >   */
> > -int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
> > -		      int classzone_idx, int alloc_flags)
> > +bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
> > +		      int classzone_idx, int alloc_flags, long free_pages)
> 
> static?
> 

Yes, it should be.

> 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index c5dfabf..ba0c70a 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2082,7 +2082,7 @@ static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
> >  		if (zone->all_unreclaimable)
> >  			continue;
> >  
> > -		if (!zone_watermark_ok(zone, order, high_wmark_pages(zone),
> > +		if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone),
> >  								0, 0))
> >  			return 1;
> >  	}
> 
> Do we need to change balance_pgdat() too?
> Otherwise, balance_pgdat() return immediately and can make semi-infinite busy loop.
> 

While balance_pgdat is calling zone_watermark_ok() the thresholds are
very low and the expected level of drift is minimal.  I considered the
semi-infinite busy loop to have a worst-case situation of 2 seconds until
the vmstat counters were synced and zone_watermark_ok* values matched.
There is an reasonable expectation that normal allocate/free activity would
sync the values for zone_watermark_ok* before that timeout.

To my surprise though, using zone_watermark_ok_safe() in balance_pgdat()
does not significantly increase the amount of time spent in the _safe()
function so it'll be called in the next version.

> 
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index 355a9e6..ddee139 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -81,6 +81,12 @@ EXPORT_SYMBOL(vm_stat);
> >  
> >  #ifdef CONFIG_SMP
> >  
> > +static int calculate_pressure_threshold(struct zone *zone)
> > +{
> > +	return max(1, (int)((high_wmark_pages(zone) - low_wmark_pages(zone) /
> > +				num_online_cpus())));
> > +}
> 
> On Shaohua's machine,
> 
> 	CPU: 64
> 	MEM: 8GBx4 (=32GB)
> 	per-cpu vm-stat threashold: 98
> 
> 	zone->min = sqrt(32x1024x1024x16)/4 = 5792 KB = 1448 pages
> 	zone->high - zone->low = zone->min/4 = 362pages
> 	pressure-vm-threshold = 362/64 ~= 5
> 
> Hrm, this reduction seems slightly dramatically (98->5). 

Yes, but consider the maximum possible drift;

	percpu-maximum-drift = 5*64 = 320

The value is massively reduced and the cost goes up but this is the value
necessary to avoid a situation where the high watermark is "ok" when in fact
the min watermark can be breached.

> Shaohua, can you please rerun your problem workload on your 64cpus machine with
> applying this patch?
> Of cource, If there is no performance degression, I'm not against this one.
> 

Your patches that adjusted min and high may allow this threshold to grow again.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

     prev parent reply	other threads:[~2010-10-27  8:19 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-28  5:08 zone state overhead Shaohua Li
2010-09-28 12:39 ` Christoph Lameter
2010-09-28 13:30   ` Mel Gorman
2010-09-28 13:40     ` Christoph Lameter
2010-09-28 13:51       ` Mel Gorman
2010-09-28 14:08         ` Christoph Lameter
2010-09-29  3:02           ` Shaohua Li
2010-09-29  4:02     ` David Rientjes
2010-09-29  4:47       ` Shaohua Li
2010-09-29  5:06         ` David Rientjes
2010-09-29 10:03       ` Mel Gorman
2010-09-29 14:12         ` Christoph Lameter
2010-09-29 14:17           ` Mel Gorman
2010-09-29 14:34             ` Christoph Lameter
2010-09-29 14:41               ` Mel Gorman
2010-09-29 14:45                 ` Mel Gorman
2010-09-29 14:54                   ` Christoph Lameter
2010-09-29 14:52                 ` Christoph Lameter
2010-09-29 19:44         ` David Rientjes
2010-10-08 15:29 ` Mel Gorman
2010-10-09  0:58   ` Shaohua Li
2010-10-11  8:56     ` Mel Gorman
2010-10-12  1:05       ` Shaohua Li
2010-10-12 16:25         ` Mel Gorman
2010-10-13  2:41           ` Shaohua Li
2010-10-13 12:09             ` Mel Gorman
2010-10-13  3:36           ` KOSAKI Motohiro
2010-10-13  6:25             ` [RFC][PATCH 0/3] mm: reserve max drift pages at boot time instead using zone_page_state_snapshot() KOSAKI Motohiro
2010-10-13  6:27               ` [RFC][PATCH 1/3] mm, mem-hotplug: recalculate lowmem_reserve when memory hotplug occur KOSAKI Motohiro
2010-10-13  6:39                 ` KAMEZAWA Hiroyuki
2010-10-13 12:59                 ` Mel Gorman
2010-10-14  2:44                   ` KOSAKI Motohiro
2010-10-13  6:28               ` [RFC][PATCH 2/3] mm: update pcp->stat_threshold " KOSAKI Motohiro
2010-10-13  6:40                 ` KAMEZAWA Hiroyuki
2010-10-13 13:02                 ` Mel Gorman
2010-10-13  6:32               ` [RFC][PATCH 3/3] mm: reserve max drift pages at boot time instead using zone_page_state_snapshot() KOSAKI Motohiro
2010-10-13 13:19                 ` Mel Gorman
2010-10-14  2:39                   ` KOSAKI Motohiro
2010-10-18 10:43                     ` Mel Gorman
2010-10-13  7:10               ` [experimental][PATCH] mm,vmstat: per cpu stat flush too when per cpu page cache flushed KOSAKI Motohiro
2010-10-13  7:16                 ` KAMEZAWA Hiroyuki
2010-10-13 13:22                 ` Mel Gorman
2010-10-14  2:50                   ` KOSAKI Motohiro
2010-10-15 17:31                     ` Christoph Lameter
2010-10-18  9:27                       ` KOSAKI Motohiro
2010-10-18 15:44                         ` Christoph Lameter
2010-10-19  1:10                           ` KOSAKI Motohiro
2010-10-18 11:08                     ` Mel Gorman
2010-10-19  1:34                       ` KOSAKI Motohiro
2010-10-19  9:06                         ` Mel Gorman
2010-10-18 15:51                 ` Christoph Lameter
2010-10-19  0:43                   ` KOSAKI Motohiro
2010-10-13 11:24             ` zone state overhead Mel Gorman
2010-10-14  3:07               ` KOSAKI Motohiro
2010-10-18 10:39                 ` Mel Gorman
2010-10-19  1:16                   ` KOSAKI Motohiro
2010-10-19  9:08                     ` Mel Gorman
2010-10-22 14:12                       ` Mel Gorman
2010-10-22 15:23                         ` Christoph Lameter
2010-10-22 18:45                           ` Mel Gorman
2010-10-22 15:27                         ` Christoph Lameter
2010-10-22 18:46                           ` Mel Gorman
2010-10-22 20:01                             ` Christoph Lameter
2010-10-25  4:46                         ` KOSAKI Motohiro
2010-10-27  8:19                           ` Mel Gorman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101027081910.GI5383@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.