Re: Benchmark results: "Enhanced NUMA scheduling with adaptive affinity"

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, Paul Turner <pjt@google.com>,
	Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	Christoph Lameter <cl@linux.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: Benchmark results: "Enhanced NUMA scheduling with adaptive affinity"
Date: Fri, 16 Nov 2012 14:14:28 +0000	[thread overview]
Message-ID: <20121116141428.GZ8218@suse.de> (raw)
In-Reply-To: <50A566FA.2090306@redhat.com>

On Thu, Nov 15, 2012 at 05:04:42PM -0500, Rik van Riel wrote:
> On 11/15/2012 03:32 PM, Linus Torvalds wrote:
> >Ugh.
> >
> >According to these numbers, the latest sched-numa actually regresses
> >against mainline on Specjbb.
> >
> >No way is this even close to ready for merging in the 3.8 timeframe.
> >
> >I would ask the invilved people to please come up with a set of
> >initial patches that people agree on, so that we can at least start
> >merging some of the infrastructure, and see how far we can get on at
> >least getting *started*. As I mentioned to Andrew and Mel separately,
> >nobody seems to disagree with the TLB optimization patches. What else?
> >Is Mel's set of early patches still considered a reasonable starting
> >point for everybody?
> 
> Mel's infrastructure patches, 1-14 and 17 out
> of his latest series, could be a great starting
> point.
> 

V3 increased a lot in size due to rate-limiting of migration which was
yanked out of autonuma. The rate limiting has two obvious purposes. One,
during periods of fast convergency it will prevent the memory bus being
saturated with traffic and causing stalls. As a side effect it should
decrease system CPU usage in some cases. Two, if the placement policy
completely breaks down, it will help contain the damage. If we added a vmstat
that increments when the rate limiting kicked in then users could report
broken policies by checking if the migration and rate-limited counter are
increasing. If they are both increasing rapidly then the placement policy
is broken. I think identifying when it's broken is just as important as
identifying when it's working.

The equivalent numbered patches in the new series to match what Rik suggests
above are Patches 1-17, 19. I'll swap patches 19 and 18 to avoid this mess.
The TLB patches are 33-35 but are not contested. I am going to move them
to the start of the series.

With some shuffling the question on what to consider for merging
becomes

1. TLB optimisation patches 1-3?	 	Patches  1-3
2. Stats for migration?				Patches  4-6
3. Common NUMA infrastructure?			Patches  7-21
4. Basic fault-driven policy, stats, ratelimits	Patches 22-35

Patches 36-43 are complete cabbage and should not be considered at this
stage. It should be possible to build the placement policies and the
scheduling decisions from schednuma, autonuma, some combination of the
above or something completely different on top of patches 1-35.

Peter, Ingo, Andrea?

I know that other common patches that should exist but they are
optimisations to the policies and not a fundamental design choice.

> Ingo is trying to get the mm/ code in his tree
> to be mostly the same to Mel's code anyway, so
> that is the infrastructure everybody wants.
> 
> At that point, we can focus our discussions on
> just the policy side, which could help us zoom in
> on the issues.
> 

Preferably yes and we'd have a comparison points of mainline and the most
basic of placement policies to work with that should be bisectable as a
last resort.

> It would also make it possible for us to do apple
> to apple comparisons between the various policy
> decisions, allowing us to reach a decision based
> on data, not just gut feel.
> 
> As long as each tree has its own basic infrastructure,
> we cannot do apples to apples comparisons; this has
> frustrated the discussion for months.
> 
> Having all that basic infrastructure upstream should
> short-circuit that part of the discussion.
> 

Agreed.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2012-11-16 14:14 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-12 16:04 [PATCH 0/8] Announcement: Enhanced NUMA scheduling with adaptive affinity Peter Zijlstra
2012-11-12 16:04 ` [PATCH 1/8] sched, numa, mm: Introduce sched_feat_numa() Peter Zijlstra
2012-11-12 16:04 ` [PATCH 2/8] sched, numa, mm: Implement THP migration Peter Zijlstra
2012-11-12 16:04 ` [PATCH 3/8] sched, numa, mm: Add credits for NUMA placement Peter Zijlstra
2012-11-12 16:04 ` [PATCH 4/8] sched, numa, mm: Add last_cpu to page flags Peter Zijlstra
2012-11-13 11:55   ` Ingo Molnar
2012-11-13 16:09   ` Rik van Riel
2012-11-12 16:04 ` [PATCH 5/8] sched, numa, mm: Add adaptive NUMA affinity support Peter Zijlstra
2012-11-13  0:02   ` Christoph Lameter
2012-11-13  8:19     ` Ingo Molnar
2012-11-13 22:57   ` Rik van Riel
2012-11-16 18:06   ` Rik van Riel
2012-11-16 18:14     ` Ingo Molnar
2012-11-16 18:23       ` Rik van Riel
2012-11-29 19:34   ` Andi Kleen
2012-11-12 16:04 ` [PATCH 6/8] sched, numa, mm: Implement constant, per task Working Set Sampling (WSS) rate Peter Zijlstra
2012-11-12 16:04 ` [PATCH 7/8] sched, numa, mm: Count WS scanning against present PTEs, not virtual memory ranges Peter Zijlstra
2012-11-12 16:04 ` [PATCH 8/8] sched, numa, mm: Implement slow start for working set sampling Peter Zijlstra
2012-11-12 18:48 ` Benchmark results: "Enhanced NUMA scheduling with adaptive affinity" Ingo Molnar
2012-11-15 10:08   ` Mel Gorman
2012-11-15 18:52     ` Rik van Riel
2012-11-15 21:27       ` Mel Gorman
2012-11-15 20:32     ` Linus Torvalds
2012-11-15 22:04       ` Rik van Riel
2012-11-16 14:14         ` Mel Gorman [this message]
2012-11-16 19:50           ` Andrea Arcangeli
2012-11-16 20:05             ` Mel Gorman
2012-11-16 16:16       ` Ingo Molnar
2012-11-16 15:56     ` Ingo Molnar
2012-11-16 16:25       ` Mel Gorman
2012-11-16 17:49         ` Ingo Molnar
2012-11-16 19:04           ` Mel Gorman
2012-11-12 23:43 ` [PATCH 0/8] Announcement: Enhanced NUMA scheduling with adaptive affinity Christoph Lameter
2012-11-13  7:24   ` Ingo Molnar
2012-11-15 14:26     ` Christoph Lameter
2012-11-16 15:59       ` Ingo Molnar
2012-11-16 20:57         ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121116141428.GZ8218@suse.de \
    --to=mgorman@suse.de \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=pjt@google.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).