From: Andrea Arcangeli <aarcange@redhat.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Peter Zijlstra <pzijlstr@redhat.com>, Ingo Molnar <mingo@elte.hu>,
Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Hillf Danton <dhillf@gmail.com>,
Andrew Jones <drjones@redhat.com>, Dan Smith <danms@us.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Paul Turner <pjt@google.com>, Christoph Lameter <cl@linux.com>,
Suresh Siddha <suresh.b.siddha@intel.com>,
Mike Galbraith <efault@gmx.de>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH 00/33] AutoNUMA27
Date: Fri, 12 Oct 2012 03:45:53 +0200 [thread overview]
Message-ID: <20121012014553.GD1818@redhat.com> (raw)
In-Reply-To: <20121011213432.GQ3317@csn.ul.ie>
Hi Mel,
On Thu, Oct 11, 2012 at 10:34:32PM +0100, Mel Gorman wrote:
> So after getting through the full review of it, there wasn't anything
> I could not stand. I think it's *very* heavy on some of the paths like
> the idle balancer which I was not keen on and the fault paths are also
> quite heavy. I think the weight on some of these paths can be reduced
> but not to 0 if the objectives to autonuma are to be met.
>
> I'm not fully convinced that the task exchange is actually necessary or
> beneficial because it somewhat assumes that there is a symmetry between CPU
> and memory balancing that may not be true. The fact that it only considers
The problem is that without an active task exchange and no explicit
call to stop_one_cpu*, there's no way to migrate a currently running
task and clearly we need that. We can indefinitely wait hoping the
task goes to sleep and leaves the CPU idle, or that a couple of other
tasks start and trigger load balance events.
We must move tasks even if all cpus are in a steady rq->nr_running ==
1 state and there's no other scheduler balance event that could
possibly attempt to move tasks around in such a steady state.
Of course one could hack the active idle balancing so that it does the
active NUMA balancing action, but that would be a purely artificial
complication: it would add unnecessary delay and it would provide no
benefit whatsoever.
Why don't we dump the active idle balancing too, and we hack the load
balancing to do the active idle balancing as well? Of course then the
two will be more integrated. But it'll be a mess and slower and
there's a good reason why they exist as totally separated pieces of
code working in parallel.
We can integrate it more, but in my view the result would be worse and
more complicated. Last but not the least messing the idle balancing
code to do an active NUMA balancing action (somehow invoking
stop_one_cpu* in the steady state described above) would force even
cellphones and UP kernels to deal with NUMA code somehow.
> tasks that are currently running feels a bit random but examining all tasks
> that recently ran on the node would be far too expensive to there is no
So far this seems a good tradeoff. Nothing will prevent us to scan
deeper into the runqueues later if find a way to do that efficiently.
> good answer. You are caught between a rock and a hard place and either
> direction you go is wrong for different reasons. You need something more
I think you described the problem perfectly ;).
> frequent than scans (because it'll converge too slowly) but doing it from
> the balancer misses some tasks and may run too frequently and it's unclear
> how it effects the current load balancer decisions. I don't have a good
> alternative solution for this but ideally it would be better integrated with
> the existing scheduler when there is more data on what those scheduling
> decisions should be. That will only come from a wide range of testing and
> the inevitable bug reports.
>
> That said, this is concentrating on the problems without considering the
> situations where it would work very well. I think it'll come down to HPC
> and anything jitter-sensitive will hate this while workloads like JVM,
> virtualisation or anything that uses a lot of memory without caring about
> placement will love it. It's not perfect but it's better than incurring
> the cost of remote access unconditionally.
Full agreement.
Your detailed full review was very appreciated, thanks!
Andrea
next prev parent reply other threads:[~2012-10-12 1:46 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1349308275-2174-1-git-send-email-aarcange@redhat.com>
[not found] ` <20121004113943.be7f92a0.akpm@linux-foundation.org>
2012-10-05 23:14 ` [PATCH 00/33] AutoNUMA27 Andi Kleen
2012-10-05 23:57 ` Tim Chen
2012-10-06 0:11 ` Andi Kleen
2012-10-08 13:44 ` Don Morris
2012-10-08 20:34 ` Rik van Riel
[not found] ` <20121011101930.GM3317@csn.ul.ie>
2012-10-11 14:56 ` Andrea Arcangeli
2012-10-11 15:35 ` Mel Gorman
2012-10-12 0:41 ` Andrea Arcangeli
2012-10-12 14:54 ` Mel Gorman
[not found] ` <1349308275-2174-2-git-send-email-aarcange@redhat.com>
[not found] ` <20121011105036.GN3317@csn.ul.ie>
2012-10-11 16:07 ` [PATCH 01/33] autonuma: add Documentation/vm/autonuma.txt Andrea Arcangeli
2012-10-11 19:37 ` Mel Gorman
[not found] ` <1349308275-2174-5-git-send-email-aarcange@redhat.com>
[not found] ` <20121011110137.GQ3317@csn.ul.ie>
2012-10-11 16:43 ` [PATCH 04/33] autonuma: define _PAGE_NUMA Andrea Arcangeli
2012-10-11 19:48 ` Mel Gorman
[not found] ` <1349308275-2174-6-git-send-email-aarcange@redhat.com>
[not found] ` <20121011111545.GR3317@csn.ul.ie>
2012-10-11 16:58 ` [PATCH 05/33] autonuma: pte_numa() and pmd_numa() Andrea Arcangeli
2012-10-11 19:54 ` Mel Gorman
[not found] ` <1349308275-2174-7-git-send-email-aarcange@redhat.com>
[not found] ` <20121011122255.GS3317@csn.ul.ie>
2012-10-11 17:05 ` [PATCH 06/33] autonuma: teach gup_fast about pmd_numa Andrea Arcangeli
2012-10-11 20:01 ` Mel Gorman
[not found] ` <1349308275-2174-8-git-send-email-aarcange@redhat.com>
[not found] ` <20121011122827.GT3317@csn.ul.ie>
2012-10-11 17:15 ` [PATCH 07/33] autonuma: mm_autonuma and task_autonuma data structures Andrea Arcangeli
2012-10-11 20:06 ` Mel Gorman
[not found] ` <5076E4B2.2040301@redhat.com>
[not found] ` <0000013a525a8739-2b4049fa-1cb3-4b8f-b3a7-1fa77b181590-000000@email.amazonses.com>
2012-10-12 0:52 ` Andrea Arcangeli
[not found] ` <1349308275-2174-9-git-send-email-aarcange@redhat.com>
[not found] ` <20121011134643.GU3317@csn.ul.ie>
2012-10-11 17:34 ` [PATCH 08/33] autonuma: define the autonuma flags Andrea Arcangeli
2012-10-11 20:17 ` Mel Gorman
[not found] ` <1349308275-2174-11-git-send-email-aarcange@redhat.com>
[not found] ` <20121011145805.GW3317@csn.ul.ie>
2012-10-12 0:25 ` [PATCH 10/33] autonuma: CPU follows memory algorithm Andrea Arcangeli
2012-10-12 8:29 ` Mel Gorman
[not found] ` <20121011213432.GQ3317@csn.ul.ie>
2012-10-12 1:45 ` Andrea Arcangeli [this message]
2012-10-12 8:46 ` [PATCH 00/33] AutoNUMA27 Mel Gorman
[not found] ` <1349308275-2174-16-git-send-email-aarcange@redhat.com>
[not found] ` <20121011155302.GA3317@csn.ul.ie>
[not found] ` <50770314.7060800@redhat.com>
[not found] ` <20121011175953.GT1818@redhat.com>
2012-10-12 14:03 ` [PATCH 15/33] autonuma: alloc/free/init task_autonuma Rik van Riel
2012-10-13 18:40 ` [PATCH 00/33] AutoNUMA27 Srikar Dronamraju
2012-10-14 4:57 ` Andrea Arcangeli
2012-10-15 8:16 ` Srikar Dronamraju
2012-10-23 16:32 ` Srikar Dronamraju
[not found] ` <1349308275-2174-20-git-send-email-aarcange@redhat.com>
[not found] ` <20121013180618.GC31442@linux.vnet.ibm.com>
2012-10-15 8:24 ` [PATCH 19/33] autonuma: memory follows CPU algorithm and task/mm_autonuma stats collection Srikar Dronamraju
2012-10-15 9:20 ` Mel Gorman
2012-10-15 10:00 ` Srikar Dronamraju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121012014553.GD1818@redhat.com \
--to=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=danms@us.ibm.com \
--cc=dhillf@gmail.com \
--cc=drjones@redhat.com \
--cc=efault@gmx.de \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mingo@elte.hu \
--cc=paulmck@linux.vnet.ibm.com \
--cc=pjt@google.com \
--cc=pzijlstr@redhat.com \
--cc=riel@redhat.com \
--cc=suresh.b.siddha@intel.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox