linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Hillf Danton <dhillf@gmail.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Dan Smith <danms@us.ibm.com>, Paul Turner <pjt@google.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Rik van Riel <riel@redhat.com>, Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	linux-mm@kvack.org, Suresh Siddha <suresh.b.siddha@intel.com>,
	Mike Galbraith <efault@gmx.de>,
	Bharata B Rao <bharata.rao@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 11/39] autonuma: CPU follow memory algorithm
Date: Wed, 28 Mar 2012 20:39:35 +0200	[thread overview]
Message-ID: <20120328183935.GM5906@redhat.com> (raw)
In-Reply-To: <1332933968.2528.26.camel@twins>

Hi,

On Wed, Mar 28, 2012 at 01:26:08PM +0200, Peter Zijlstra wrote:
> Right, so can we agree that the only case where they diverge is single
> processes that have multiple threads and are bigger than a single node (either
> in memory, cputime or both)?

I think it vastly diverges for processes that are smaller than one
node too. 1) your numa/sched goes blind with an almost arbitrary home
node, 2) your migrate-on-fault will be unable to provide an efficient
and steady async background migration.

> I've asked you several times why you care about that one case so much, but
> without answer.

If this case wasn't important to you, you wouldn't need to introduce
your syscalls.

> I'll grant you that unmodified such processes might do better with your
> stuff, however:
> 
>  - your stuff assumes there is a fair amount of locality to exploit.
> 
>    I'm not seeing how this is true in general, since data partitioning is hard
>    and for those problems where its possible people tend to already do so,
>    yielding natural points to add the syscalls.

Later, I plan to detect this and layout interleaved pages
automatically so you don't even need to manually set MPOL_INTERLEAVE.

>  - your stuff doesn't actually nest, since a guest kernel has no clue as to
>    what constitutes a node (or if there even is such a thing) it will randomly
>    move tasks around on the vcpus, with complete disrespect for whatever host
>    vcpu<->page mappings you set up.
> 
>    guest kernels actively scramble whatever relations you're building by
>    scanning, destroying whatever (temporal) locality you think you might
>    have found.

This shall work fine, running AutoNUMA in guest and host. qemu just
need to create a vtopology for the guest that matches the hardware
topology. Hard binds in the guest will also work great (they create
node locality too).

A paravirt layer could also hint the host on the vcpu switches to
shift the host numa stats across but I didn't thought too much on this
possible paravirt numa-sched optimization, it's not mandatory, just an idea.

> Related to this is that all applications that currently use mbind() and
> sched_setaffinity() are trivial to convert.

Too bad firefox isn't using mbind yet. My primary target are the 99%
of apps out there running on a 24way 2 node system or equivalent and
KVM.

I agree converting qemu to the syscalls would be trivial though.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-03-28 18:40 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-26 17:45 [PATCH 00/39] [RFC] AutoNUMA alpha10 Andrea Arcangeli
2012-03-26 17:45 ` [PATCH 01/39] autonuma: make set_pmd_at always available Andrea Arcangeli
2012-03-26 17:45 ` [PATCH 02/39] xen: document Xen is using an unused bit for the pagetables Andrea Arcangeli
2012-03-30 21:40   ` Konrad Rzeszutek Wilk
2012-03-26 17:45 ` [PATCH 03/39] autonuma: define _PAGE_NUMA_PTE and _PAGE_NUMA_PMD Andrea Arcangeli
2012-03-26 17:45 ` [PATCH 04/39] autonuma: x86 pte_numa() and pmd_numa() Andrea Arcangeli
2012-03-26 17:45 ` [PATCH 05/39] autonuma: generic " Andrea Arcangeli
2012-03-26 17:45 ` [PATCH 06/39] autonuma: teach gup_fast about pte_numa Andrea Arcangeli
2012-03-26 17:45 ` [PATCH 07/39] autonuma: introduce kthread_bind_node() Andrea Arcangeli
2012-03-26 18:32   ` Peter Zijlstra
2012-03-27 15:22     ` Andrea Arcangeli
2012-03-27 15:45       ` Peter Zijlstra
2012-03-27 16:04         ` Andrea Arcangeli
2012-03-27 16:19           ` Peter Zijlstra
2012-03-26 17:45 ` [PATCH 08/39] autonuma: mm_autonuma and sched_autonuma data structures Andrea Arcangeli
2012-03-26 17:45 ` [PATCH 09/39] autonuma: define the autonuma flags Andrea Arcangeli
2012-03-26 17:45 ` [PATCH 10/39] autonuma: core autonuma.h header Andrea Arcangeli
2012-03-26 17:45 ` [PATCH 11/39] autonuma: CPU follow memory algorithm Andrea Arcangeli
2012-03-26 18:25   ` Peter Zijlstra
2012-03-26 19:28     ` Rik van Riel
2012-03-26 19:44       ` Andrea Arcangeli
2012-03-26 19:58         ` Linus Torvalds
2012-03-26 20:39           ` Andrea Arcangeli
2012-03-27  8:39             ` Peter Zijlstra
2012-03-27 14:37               ` Andrea Arcangeli
2012-03-27 16:15               ` Andrea Arcangeli
2012-03-28 11:26                 ` Peter Zijlstra
2012-03-28 18:39                   ` Andrea Arcangeli [this message]
2012-03-27 17:09               ` Ingo Molnar
2012-03-26 17:45 ` [PATCH 12/39] autonuma: add page structure fields Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 13/39] autonuma: knuma_migrated per NUMA node queues Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 14/39] autonuma: init knuma_migrated queues Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 15/39] autonuma: autonuma_enter/exit Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 16/39] autonuma: call autonuma_setup_new_exec() Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 17/39] autonuma: alloc/free/init sched_autonuma Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 18/39] autonuma: alloc/free/init mm_autonuma Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 19/39] mm: add unlikely to the mm allocation failure check Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 20/39] autonuma: avoid CFS select_task_rq_fair to return -1 Andrea Arcangeli
2012-03-26 19:36   ` Peter Zijlstra
2012-03-26 20:53     ` Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 21/39] autonuma: fix selecting task runqueue Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 22/39] autonuma: select_task_rq_fair cleanup new_cpu < 0 fix Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 23/39] autonuma: teach CFS about autonuma affinity Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 24/39] autonuma: fix finding idlest cpu Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 25/39] autonuma: fix selecting idle sibling Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 26/39] autonuma: select_idle_sibling cleanup target assignment Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 27/39] autonuma: core Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 28/39] autonuma: follow_page check for pte_numa/pmd_numa Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 29/39] autonuma: default mempolicy follow AutoNUMA Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 30/39] autonuma: call autonuma_split_huge_page() Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 31/39] autonuma: make khugepaged pte_numa aware Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 32/39] autonuma: retain page last_nid information in khugepaged Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 33/39] autonuma: numa hinting page faults entry points Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 34/39] autonuma: reset autonuma page data when pages are freed Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 35/39] autonuma: initialize page structure fields Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 36/39] autonuma: link mm/autonuma.o and kernel/sched/numa.o Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 37/39] autonuma: add CONFIG_AUTONUMA and CONFIG_AUTONUMA_DEFAULT_ENABLED Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 38/39] autonuma: boost khugepaged scanning rate Andrea Arcangeli
2012-03-26 17:46 ` [PATCH 39/39] autonuma: NUMA scheduler SMT awareness Andrea Arcangeli
2012-03-26 18:57   ` Peter Zijlstra
2012-03-27  0:00     ` Andrea Arcangeli
2012-03-28 13:51       ` Andrea Arcangeli
2012-04-03 20:35 ` [PATCH 00/39] [RFC] AutoNUMA alpha10 Srivatsa Vaddagiri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120328183935.GM5906@redhat.com \
    --to=aarcange@redhat.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=bharata.rao@gmail.com \
    --cc=danms@us.ibm.com \
    --cc=dhillf@gmail.com \
    --cc=efault@gmx.de \
    --cc=hannes@cmpxchg.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pjt@google.com \
    --cc=riel@redhat.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).