linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nai Xia <nai.xia@gmail.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Hillf Danton <dhillf@gmail.com>, Dan Smith <danms@us.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	Paul Turner <pjt@google.com>,
	Suresh Siddha <suresh.b.siddha@intel.com>,
	Mike Galbraith <efault@gmx.de>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Bharata B Rao <bharata.rao@gmail.com>,
	Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
	Christoph Lameter <cl@linux.com>, Alex Shi <alex.shi@intel.com>,
	Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Don Morris <don.morris@hp.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [PATCH 13/40] autonuma: CPU follow memory algorithm
Date: Sat, 30 Jun 2012 02:09:44 +0800	[thread overview]
Message-ID: <4FEDEF68.6000708@gmail.com> (raw)
In-Reply-To: <20120629163025.GP6676@redhat.com>



On 2012a1'06ae??30ae?JPY 00:30, Andrea Arcangeli wrote:
> Hi Nai,
>
> On Fri, Jun 29, 2012 at 10:11:35PM +0800, Nai Xia wrote:
>> If one process do very intensive visit of a small set of pages in this
>> node, but occasional visit of a large set of pages in another node.
>> Will this algorithm do a very bad judgment? I guess the answer would
>> be: it's possible and this judgment depends on the racing pattern
>> between the process and your knuma_scand.
>
> Depending if the knuma_scand/scan_pass_sleep_millisecs is more or less
> occasional than the visit of a large set of pages it may behave
> differently correct.

I bet this racing is more subtle than this, but since you admit
this judgment is a racing problem. Then it doesn't matter how subtle
it would be.

>
> Note that every algorithm will have a limit on how smart it can be.
>
> Just to make a random example: if you lookup some pagecache a million
> times and some other pagecache a dozen times, their "aging"
> information in the pagecache will end up identical. Yet we know one
> set of pages is clearly higher priority than the other. We've only so
> many levels of lrus and so many referenced/active bitflags per
> page. Once you get at the top, then all is equal.
>
> Does this mean the "active" list working set detection is useless just
> because we can't differentiate a million of lookups on a few pages, vs
> a dozen of lookups on lots of pages?

I knew you will give us an example of LRU. ;D
But unfortunately the approximation of LRU can not justify your case:
There are cases when LRU approximation behaves very badly,
but enough research in history have told us that 90% of the workloads
conforms to this kind of approximation, and even every programmer has
been taught to write LRU conforming programs.

But we have no idea how well real world workloads will conforms to your
algo especially the racing pattern.


>
> Last but not the least, in the very example you mention it's not even
> clear that the process should be scheduled in the CPU where there is
> the small set of pages accessed frequently, or the CPU where there's
> the large set of pages accessed occasionally. If the small sets of
> pages fits in the 8MBytes of the L2 cache, then it's better to put the
> process in the other CPU where the large set of pages can't fit in the
> L2 cache. Lots of hardware details should be evaluated, to really know
> what's the right thing in such case even if it was you having to
> decide.

That's just why I think it more subtle and why I am feeling not confident
about your algo -- if the effectiveness of your algorithm depends on so
many uncertain things.

>
> But the real reason why the above isn't an issue and why we don't need
> to solve that problem perfectly: there's not just a CPU follow memory
> algorithm in AutoNUMA. There's also the memory follow CPU
> algorithm. AutoNUMA will do its best to change the layout of your
> example to one that has only one clear solution: the occasional lookup
> of the large set of pages, will make those eventually go in the node
> together with the small set of pages (or the other way around), and
> this is how it's solved.

Not sure to follow, if you fall back on this, then why all its complexity?
This fall back equals to "just group all the pages to the running" policy.


>
> In any case, whatever wrong decision it will take, it will at least be
> a better decision than the numa/sched where there's absolutely zero
> information about what pages the process is accessing. And best of all
> with AutoNUMA you also know which pages the _thread_ is accessing so
> it will also be able to take optimal decisions if there are more
> threads than CPUs in a node (as long as not all thread accesses are
> shared).

Yeah, we need the information. But how to make best of the information
is a big problem.
I feel you may not address my question only by word reasoning,
if you currently have in your hand no survey of the common page access
patterns of real world workloads.

Maybe the assumption of your algorithm is right, maybe not...


>
> Hope this explains things better.
> Andrea


Thanks,

Nai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-06-29 18:09 UTC|newest]

Thread overview: 177+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-28 12:55 [PATCH 00/40] AutoNUMA19 Andrea Arcangeli
2012-06-28 12:55 ` [PATCH 01/40] mm: add unlikely to the mm allocation failure check Andrea Arcangeli
2012-06-29 14:10   ` Rik van Riel
2012-06-28 12:55 ` [PATCH 02/40] autonuma: make set_pmd_at always available Andrea Arcangeli
2012-06-29 14:10   ` Rik van Riel
2012-06-28 12:55 ` [PATCH 03/40] autonuma: export is_vma_temporary_stack() even if CONFIG_TRANSPARENT_HUGEPAGE=n Andrea Arcangeli
2012-06-29 14:11   ` Rik van Riel
2012-06-28 12:55 ` [PATCH 04/40] xen: document Xen is using an unused bit for the pagetables Andrea Arcangeli
2012-06-29 14:16   ` Rik van Riel
2012-07-04 23:05     ` Andrea Arcangeli
2012-06-30  4:47   ` Konrad Rzeszutek Wilk
2012-07-03 10:45     ` Andrea Arcangeli
2012-06-28 12:55 ` [PATCH 05/40] autonuma: define _PAGE_NUMA_PTE and _PAGE_NUMA_PMD Andrea Arcangeli
2012-06-28 15:13   ` Don Morris
2012-06-28 15:00     ` Andrea Arcangeli
2012-06-29 14:26   ` Rik van Riel
2012-07-03 20:30     ` Andrea Arcangeli
2012-06-28 12:55 ` [PATCH 06/40] autonuma: x86 pte_numa() and pmd_numa() Andrea Arcangeli
2012-06-29 15:02   ` Rik van Riel
2012-07-04 23:03     ` Andrea Arcangeli
2012-06-28 12:55 ` [PATCH 07/40] autonuma: generic " Andrea Arcangeli
2012-06-29 15:13   ` Rik van Riel
2012-06-28 12:55 ` [PATCH 08/40] autonuma: teach gup_fast about pte_numa Andrea Arcangeli
2012-06-29 15:27   ` Rik van Riel
2012-06-28 12:55 ` [PATCH 09/40] autonuma: introduce kthread_bind_node() Andrea Arcangeli
2012-06-29 15:36   ` Rik van Riel
2012-06-29 16:04     ` Peter Zijlstra
2012-06-29 16:11       ` Rik van Riel
2012-06-29 16:38     ` Andrea Arcangeli
2012-06-29 16:58       ` Rik van Riel
2012-07-05 13:09         ` Johannes Weiner
2012-07-05 18:33           ` Glauber Costa
2012-07-05 20:07             ` Andrea Arcangeli
2012-06-30  4:50   ` Konrad Rzeszutek Wilk
2012-07-04 23:14     ` Andrea Arcangeli
2012-07-05 12:04       ` Konrad Rzeszutek Wilk
2012-07-05 12:28         ` Andrea Arcangeli
2012-07-05 12:18       ` Peter Zijlstra
2012-07-05 12:21         ` Andrea Arcangeli
2012-06-28 12:55 ` [PATCH 10/40] autonuma: mm_autonuma and sched_autonuma data structures Andrea Arcangeli
2012-06-29 15:47   ` Rik van Riel
2012-06-29 17:45   ` Rik van Riel
2012-07-04 23:16     ` Andrea Arcangeli
2012-06-28 12:55 ` [PATCH 11/40] autonuma: define the autonuma flags Andrea Arcangeli
2012-06-29 16:10   ` Rik van Riel
2012-06-30  4:58   ` Konrad Rzeszutek Wilk
2012-07-02 15:42     ` Konrad Rzeszutek Wilk
2012-06-30  5:01   ` Konrad Rzeszutek Wilk
2012-07-04 23:45     ` Andrea Arcangeli
2012-06-28 12:55 ` [PATCH 12/40] autonuma: core autonuma.h header Andrea Arcangeli
2012-06-28 12:55 ` [PATCH 13/40] autonuma: CPU follow memory algorithm Andrea Arcangeli
2012-06-28 14:46   ` Peter Zijlstra
2012-06-29 14:11     ` Nai Xia
2012-06-29 16:30       ` Andrea Arcangeli
2012-06-29 18:09         ` Nai Xia [this message]
2012-06-29 21:02         ` Nai Xia
2012-07-03 11:53     ` Peter Zijlstra
2012-06-28 14:53   ` Peter Zijlstra
2012-06-29 12:16     ` Hillf Danton
2012-06-29 12:55       ` Ingo Molnar
2012-06-29 16:51         ` Dor Laor
2012-06-29 18:41           ` Peter Zijlstra
2012-06-29 18:46             ` Rik van Riel
2012-06-29 18:51               ` Peter Zijlstra
2012-06-29 18:57               ` Peter Zijlstra
2012-06-29 19:03                 ` Peter Zijlstra
2012-06-29 19:19                   ` Rik van Riel
2012-07-02 16:57                     ` Vaidyanathan Srinivasan
2012-07-05 16:56                       ` Vaidyanathan Srinivasan
2012-07-06 13:04                         ` Hillf Danton
2012-07-06 18:38                           ` Vaidyanathan Srinivasan
2012-07-12 13:12                             ` Andrea Arcangeli
2012-06-29 18:49           ` Peter Zijlstra
2012-06-29 18:53           ` Peter Zijlstra
2012-06-29 20:01             ` Nai Xia
2012-06-29 20:44               ` Nai Xia
2012-06-30  1:23               ` Andrea Arcangeli
2012-06-30  2:43                 ` Nai Xia
2012-06-30  5:48                   ` Dor Laor
2012-06-30  6:58                     ` Nai Xia
2012-06-30 13:04                       ` Andrea Arcangeli
2012-06-30 15:19                         ` Nai Xia
2012-06-30 19:37                       ` Dor Laor
2012-07-01  2:41                         ` Nai Xia
2012-06-30 23:55                       ` Benjamin Herrenschmidt
2012-07-01  3:10                         ` Nai Xia
2012-06-30  8:23                     ` Nai Xia
2012-07-02  7:29                       ` Rik van Riel
2012-07-02  7:43                         ` Nai Xia
2012-06-30 12:48                   ` Andrea Arcangeli
2012-06-30 15:10                     ` Nai Xia
2012-07-02  7:36                       ` Rik van Riel
2012-07-02  7:56                         ` Nai Xia
2012-07-02  8:17                           ` Rik van Riel
2012-07-02  8:31                             ` Nai Xia
2012-07-05 18:07               ` Rik van Riel
2012-07-05 22:59                 ` Andrea Arcangeli
2012-07-06  1:00                 ` Nai Xia
2012-06-29 19:04           ` Peter Zijlstra
2012-06-29 20:27             ` Nai Xia
2012-06-29 18:03   ` Rik van Riel
2012-06-28 12:55 ` [PATCH 14/40] autonuma: add page structure fields Andrea Arcangeli
2012-06-29 18:06   ` Rik van Riel
2012-06-28 12:55 ` [PATCH 15/40] autonuma: knuma_migrated per NUMA node queues Andrea Arcangeli
2012-06-29 18:31   ` Rik van Riel
2012-06-28 12:55 ` [PATCH 16/40] autonuma: init knuma_migrated queues Andrea Arcangeli
2012-06-29 18:35   ` Rik van Riel
2012-06-28 12:55 ` [PATCH 17/40] autonuma: autonuma_enter/exit Andrea Arcangeli
2012-06-29 18:37   ` Rik van Riel
2012-06-28 12:55 ` [PATCH 18/40] autonuma: call autonuma_setup_new_exec() Andrea Arcangeli
2012-06-29 18:39   ` Rik van Riel
2012-06-30  5:04   ` Konrad Rzeszutek Wilk
2012-07-12 17:50     ` Andrea Arcangeli
2012-06-28 12:55 ` [PATCH 19/40] autonuma: alloc/free/init sched_autonuma Andrea Arcangeli
2012-06-29 18:52   ` Rik van Riel
2012-06-30  5:10   ` Konrad Rzeszutek Wilk
2012-07-12 17:59     ` Andrea Arcangeli
2012-06-28 12:56 ` [PATCH 20/40] autonuma: alloc/free/init mm_autonuma Andrea Arcangeli
2012-06-29 18:54   ` Rik van Riel
2012-06-30  5:12   ` Konrad Rzeszutek Wilk
2012-07-12 18:08     ` Andrea Arcangeli
2012-07-12 18:17       ` Johannes Weiner
2012-07-13 14:19         ` Christoph Lameter
2012-07-14 17:01           ` Andrea Arcangeli
2012-07-01 15:33   ` Rik van Riel
2012-07-12 18:27     ` Andrea Arcangeli
2012-06-28 12:56 ` [PATCH 21/40] autonuma: avoid CFS select_task_rq_fair to return -1 Andrea Arcangeli
2012-06-29 18:57   ` Rik van Riel
2012-06-29 19:05     ` Peter Zijlstra
2012-06-29 19:07       ` Rik van Riel
2012-06-29 20:48         ` Ingo Molnar
2012-06-28 12:56 ` [PATCH 22/40] autonuma: teach CFS about autonuma affinity Andrea Arcangeli
2012-07-01 16:37   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 23/40] autonuma: sched_set_autonuma_need_balance Andrea Arcangeli
2012-07-01 16:57   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 24/40] autonuma: core Andrea Arcangeli
2012-07-02  4:07   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 25/40] autonuma: follow_page check for pte_numa/pmd_numa Andrea Arcangeli
2012-07-02  4:14   ` Rik van Riel
2012-07-14 16:43     ` Andrea Arcangeli
2012-06-28 12:56 ` [PATCH 26/40] autonuma: default mempolicy follow AutoNUMA Andrea Arcangeli
2012-07-02  4:19   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 27/40] autonuma: call autonuma_split_huge_page() Andrea Arcangeli
2012-07-02  4:22   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 28/40] autonuma: make khugepaged pte_numa aware Andrea Arcangeli
2012-07-02  4:24   ` Rik van Riel
2012-07-12 18:50     ` Andrea Arcangeli
2012-07-12 21:25       ` Rik van Riel
2012-06-28 12:56 ` [PATCH 29/40] autonuma: retain page last_nid information in khugepaged Andrea Arcangeli
2012-07-02  4:33   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 30/40] autonuma: numa hinting page faults entry points Andrea Arcangeli
2012-07-02  4:47   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 31/40] autonuma: reset autonuma page data when pages are freed Andrea Arcangeli
2012-07-02  4:49   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 32/40] autonuma: initialize page structure fields Andrea Arcangeli
2012-07-02  4:50   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 33/40] autonuma: link mm/autonuma.o and kernel/sched/numa.o Andrea Arcangeli
2012-07-02  4:56   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 34/40] autonuma: add CONFIG_AUTONUMA and CONFIG_AUTONUMA_DEFAULT_ENABLED Andrea Arcangeli
2012-07-02  4:58   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 35/40] autonuma: boost khugepaged scanning rate Andrea Arcangeli
2012-07-02  5:12   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 36/40] autonuma: page_autonuma Andrea Arcangeli
2012-06-30  5:24   ` Konrad Rzeszutek Wilk
2012-07-12 19:43     ` Andrea Arcangeli
2012-07-02  6:37   ` Rik van Riel
2012-07-12 19:58     ` Andrea Arcangeli
2012-06-28 12:56 ` [PATCH 37/40] autonuma: page_autonuma change #include for sparse Andrea Arcangeli
2012-07-02  6:22   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 38/40] autonuma: autonuma_migrate_head[0] dynamic size Andrea Arcangeli
2012-07-02  5:15   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 39/40] autonuma: bugcheck page_autonuma fields on newly allocated pages Andrea Arcangeli
2012-07-02  6:40   ` Rik van Riel
2012-06-28 12:56 ` [PATCH 40/40] autonuma: shrink the per-page page_autonuma struct size Andrea Arcangeli
2012-07-02  7:18   ` Rik van Riel
2012-07-12 20:21     ` Andrea Arcangeli
2012-07-09 15:40 ` [PATCH 00/40] AutoNUMA19 Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FEDEF68.6000708@gmail.com \
    --to=nai.xia@gmail.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=benh@kernel.crashing.org \
    --cc=bharata.rao@gmail.com \
    --cc=cl@linux.com \
    --cc=danms@us.ibm.com \
    --cc=dhillf@gmail.com \
    --cc=don.morris@hp.com \
    --cc=efault@gmx.de \
    --cc=hannes@cmpxchg.org \
    --cc=konrad.wilk@oracle.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mauricfo@linux.vnet.ibm.com \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pjt@google.com \
    --cc=riel@redhat.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).