public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Zoltan Menyhart <Zoltan.Menyhart_AT_bull.net@nospam.org>
To: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: haveblue@us.ibm.com, linux-ia64@vger.kernel.org,
	linux-kernel@vger.kernel.org, iwamoto@valinux.co.jp
Subject: Re: Migrate pages from a ccNUMA node to another - patch
Date: Mon, 05 Apr 2004 17:07:42 +0200	[thread overview]
Message-ID: <4071763E.7293CFCC@nospam.org> (raw)
In-Reply-To: 20040403.115833.74749140.taka@valinux.co.jp

Hirokazu Takahashi wrote:

> I guess aruguments src_node, mm and pte would be redundant since
> they can be looked up from old_p with the reverse mapping scheme.

In my version 0.2, I can do with only the following arguments:
 *		node:	Destination NUMA node
 *		mm:	-> victim "mm_struct"
 *		pte:	-> PTE of the page to be moved
(If I have "mm" at hand, why not to use it ? Why not to avoid fetching the r-map
page struct ?)

> >Notes: "pte" can be NULL if I do not know it apriori
> >       I cannot release "mm->page_table_lock" otherwise I have to re-scan the "mm->pgd".
> 
> Re-schan plicy would be much better since migrating pages is heavy work.
> I don't think that holding mm->page_table_lock for long time would be
> good idea.

Re-scanning is "cache killer", at least on IA64 with huge user memory size.
I have more than 512 Mbytes user memory and its PTEs do not fit into the L2 cache.

In my current design, I have the outer loops: PGD, PMD and PTE walking; and once
I find a valid PTE, I check it against the list of max. 2048 physical addresses as
the inner loop.
I reversed them: walking through the list of max. 2048 physical addresses as outer
loop and the PGD - PMD - PTE scans as inner loops resulted in 4 to 5 times slower
migration.

> How do you think about following algorism:
>   1. get mm->page_table_lock
>   2. chose some pages.
>   3. release mm->page_table_lock
>   4. call remap_onepage() against each page.
>   5. goto step1 if there remain pages to be migrated.

I want to move the most frequently used pages - at least with the HW assisted
hot page detection.
I take "mm->page_table_lock", I nuke the PTE. We've got a good chance that the CPU
using the page observes a page fault almost immediately. It enters the page fault
handler and gets blocked by "mm->page_table_lock". If I released the lock, the CPU
could continue and realize that there is nothing to do, the page fault has already
been repaired. In the mean time, it is me who wait for "mm->page_table_lock".
At worst this scenario happens 2048 times.
If I keep the lock, the victim CPU enters only once the page fault handler.

I think what we should do is to "pull in" pages in to a node rather than than
"pushing them out" for two reasons:
- the recipient CPU executes the migration instead of busy waiting for the lock
- there is chance that the recipient CPU will find the migrated data useful
  in its cache

Regards,

Zoltán Menyhárt

  reply	other threads:[~2004-04-05 15:07 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-26  9:18 Migrate pages from a ccNUMA node to another - patch Zoltan Menyhart
2004-03-26 17:20 ` Dave Hansen
2004-03-30  8:27   ` IWAMOTO Toshihiro
2004-03-30  9:05     ` Hirokazu Takahashi
2004-03-30 11:20       ` Zoltan Menyhart
2004-03-30 12:08         ` Hirokazu Takahashi
2004-03-30 14:32           ` Zoltan Menyhart
2004-04-03  2:58             ` Hirokazu Takahashi
2004-04-05 15:07               ` Zoltan Menyhart [this message]
2004-04-05 15:40                 ` Dave Hansen
2004-04-06 14:42                   ` Migrate pages from a ccNUMA node to another Zoltan Menyhart
2004-04-08 13:32                 ` Migrate pages from a ccNUMA node to another - patch Hirokazu Takahashi
2004-03-30 11:39   ` Zoltan Menyhart
2004-03-30 15:18     ` Dave Hansen
2004-03-30 15:58     ` Dave Hansen
2004-03-30 16:37       ` Dave Hansen
2004-04-01  8:44       ` Migrate pages from a ccNUMA node to another Zoltan Menyhart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4071763E.7293CFCC@nospam.org \
    --to=zoltan.menyhart_at_bull.net@nospam.org \
    --cc=Zoltan.Menyhart@bull.net \
    --cc=haveblue@us.ibm.com \
    --cc=iwamoto@valinux.co.jp \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=taka@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox