linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-FSDevel <linux-fsdevel@vger.kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH 4/4] mm: page_alloc: Reduce cost of the fair zone allocation policy
Date: Mon, 30 Jun 2014 22:51:21 +0100	[thread overview]
Message-ID: <20140630215121.GQ10819@suse.de> (raw)
In-Reply-To: <20140630141404.e09bdb5fa6a879d17c4556b1@linux-foundation.org>

On Mon, Jun 30, 2014 at 02:14:04PM -0700, Andrew Morton wrote:
> On Mon, 30 Jun 2014 17:48:03 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > The fair zone allocation policy round-robins allocations between zones
> > within a node to avoid age inversion problems during reclaim. If the
> > first allocation fails, the batch counts is reset and a second attempt
> > made before entering the slow path.
> > 
> > One assumption made with this scheme is that batches expire at roughly the
> > same time and the resets each time are justified. This assumption does not
> > hold when zones reach their low watermark as the batches will be consumed
> > at uneven rates.  Allocation failure due to watermark depletion result in
> > additional zonelist scans for the reset and another watermark check before
> > hitting the slowpath.
> > 
> > This patch makes a number of changes that should reduce the overall cost
> > 
> > o Abort the fair zone allocation policy once remote zones are encountered
> > o Use a simplier scan when resetting NR_ALLOC_BATCH
> > o Use a simple flag to identify depleted zones instead of accessing a
> >   potentially write-intensive cache line for counters
> > 
> > On UMA machines, the effect on overall performance is marginal. The main
> > impact is on system CPU usage which is small enough on UMA to begin with.
> > This comparison shows the system CPu usage between vanilla, the previous
> > patch and this patch.
> > 
> >           3.16.0-rc2  3.16.0-rc2  3.16.0-rc2
> >              vanilla checklow-v4 fairzone-v4
> > User          390.13      400.85      396.13
> > System        404.41      393.60      389.61
> > Elapsed      5412.45     5166.12     5163.49
> > 
> > There is a small reduction and it appears consistent.
> > 
> > On NUMA machines, the scanning overhead is higher as zones are scanned
> > that are ineligible for use by zone allocation policy. This patch fixes
> > the zone-order zonelist policy and reduces the numbers of zones scanned
> > by the allocator leading to an overall reduction of CPU usage.
> > 
> >           3.16.0-rc2  3.16.0-rc2  3.16.0-rc2
> >              vanilla checklow-v4 fairzone-v4
> > User          744.05      763.26      778.53
> > System      70148.60    49331.48    44905.73
> > Elapsed     28094.08    27476.72    27378.98
> 
> That's a large change in system time.  Does this all include kswapd
> activity?
> 

I don't have a profile to quantify that exactly. It takes 7 hours to
complete a test on that machine in this configuration and it would take
longer with profiling. I was not testing with profiling enabled as that
invalidates performance tests. I'd expect it'd take the guts of two days
to gather full profiles for it and even then it would be masked by remote
access costs and other factors. It'd be worse considering that automatic
NUMA balancing is enabled and I normally test with that turned on.

However, without the kswapd change there are a lot of retries and
reallocations for pages recently reclaimed. For the fairzone patch there
are far fewer scans of unusable zones to find the lower zones. Considering
the number of allocations required there is simply a lot of overhead that
builds up.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-06-30 21:51 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 16:47 [PATCH 0/5] Improve sequential read throughput v4r8 Mel Gorman
2014-06-30 16:48 ` [PATCH 1/4] mm: pagemap: Avoid unnecessary overhead when tracepoints are deactivated Mel Gorman
2014-06-30 16:48 ` [PATCH 2/4] mm: Rearrange zone fields into read-only, page alloc, statistics and page reclaim lines Mel Gorman
2014-06-30 16:48 ` [PATCH 3/4] mm: vmscan: Do not reclaim from lower zones if they are balanced Mel Gorman
2014-06-30 16:48 ` [PATCH 4/4] mm: page_alloc: Reduce cost of the fair zone allocation policy Mel Gorman
2014-06-30 21:14   ` Andrew Morton
2014-06-30 21:51     ` Mel Gorman [this message]
2014-06-30 22:09       ` Andrew Morton
2014-07-01  8:02         ` Mel Gorman
2014-07-01 17:16 ` [PATCH 0/5] Improve sequential read throughput v4r8 Johannes Weiner
2014-07-01 18:39   ` Mel Gorman
2014-07-01 20:58     ` Mel Gorman
2014-07-01 21:25     ` Johannes Weiner
2014-07-02 15:44       ` Johannes Weiner
2014-07-02 15:53         ` Mel Gorman
2014-07-01 22:38     ` Dave Chinner
2014-07-01 23:09       ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140630215121.GQ10819@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).