Re: Migrate pages from a ccNUMA node to another - patch

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Hansen <haveblue@us.ibm.com>
To: Zoltan.Menyhart@bull.net
Cc: Hirokazu Takahashi <taka@valinux.co.jp>,
	linux-ia64@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	iwamoto@valinux.co.jp
Subject: Re: Migrate pages from a ccNUMA node to another - patch
Date: Mon, 05 Apr 2004 15:40:33 +0000	[thread overview]
Message-ID: <1081179633.8956.2992.camel@nighthawk> (raw)
In-Reply-To: <4071763E.7293CFCC@nospam.org>

On Mon, 2004-04-05 at 08:07, Zoltan Menyhart wrote:
> Hirokazu Takahashi wrote:
> 
> > I guess aruguments src_node, mm and pte would be redundant since
> > they can be looked up from old_p with the reverse mapping scheme.
> 
> In my version 0.2, I can do with only the following arguments:
>  *		node:	Destination NUMA node
>  *		mm:	-> victim "mm_struct"
>  *		pte:	-> PTE of the page to be moved
> (If I have "mm" at hand, why not to use it ? Why not to avoid fetching the r-map
> page struct ?)

That's a good point.  There is at least some cost (at least 1 lock)
associated with walking the rmap chains.  If it can be avoided, it might
as well be.  

But, if someone needs the "no walk" interface, just wrap the function:

foo(page)
{
	rmap_results = get_rmap_stuff(page);
	__foo(page, rmap_results);
}

__foo(page, rmap_results)
{
...
}

> > >Notes: "pte" can be NULL if I do not know it apriori
> > >       I cannot release "mm->page_table_lock" otherwise I have to re-scan the "mm->pgd".
> > 
> > Re-schan plicy would be much better since migrating pages is heavy work.
> > I don't think that holding mm->page_table_lock for long time would be
> > good idea.
> 
> Re-scanning is "cache killer", at least on IA64 with huge user memory size.
> I have more than 512 Mbytes user memory and its PTEs do not fit into the L2 cache.
> 
> In my current design, I have the outer loops: PGD, PMD and PTE walking; and once
> I find a valid PTE, I check it against the list of max. 2048 physical addresses as
> the inner loop.
> I reversed them: walking through the list of max. 2048 physical addresses as outer
> loop and the PGD - PMD - PTE scans as inner loops resulted in 4 to 5 times slower
> migration.

Could you explain where you're getting these "magic numbers?"  I don't
quite understand the significance of 2048 physical addresses or 512 MB
of memory.

Zoltan, it appears that we have a bit of an inherent conflict with how
much CPU each of you is expecting to use in the removal and migration
cases.  You're coming from a HPC environment where each CPU cycle is
valuable, while the people trying to remove memory are probably going to
be taking CPUs offline soon anyway, and care a bit less about how
efficient they're being with CPU and cache resources.  

Could you be a bit more explicit about how expensive (cpu-wise) these
migrate operations can be?

-- Dave

WARNING: multiple messages have this Message-ID (diff)

From: Dave Hansen <haveblue@us.ibm.com>
To: Zoltan.Menyhart@bull.net
Cc: Hirokazu Takahashi <taka@valinux.co.jp>,
	linux-ia64@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	iwamoto@valinux.co.jp
Subject: Re: Migrate pages from a ccNUMA node to another - patch
Date: Mon, 05 Apr 2004 08:40:33 -0700	[thread overview]
Message-ID: <1081179633.8956.2992.camel@nighthawk> (raw)
In-Reply-To: <4071763E.7293CFCC@nospam.org>

On Mon, 2004-04-05 at 08:07, Zoltan Menyhart wrote:
> Hirokazu Takahashi wrote:
> 
> > I guess aruguments src_node, mm and pte would be redundant since
> > they can be looked up from old_p with the reverse mapping scheme.
> 
> In my version 0.2, I can do with only the following arguments:
>  *		node:	Destination NUMA node
>  *		mm:	-> victim "mm_struct"
>  *		pte:	-> PTE of the page to be moved
> (If I have "mm" at hand, why not to use it ? Why not to avoid fetching the r-map
> page struct ?)

That's a good point.  There is at least some cost (at least 1 lock)
associated with walking the rmap chains.  If it can be avoided, it might
as well be.  

But, if someone needs the "no walk" interface, just wrap the function:

foo(page)
{
	rmap_results = get_rmap_stuff(page);
	__foo(page, rmap_results);
}

__foo(page, rmap_results)
{
...
}

> > >Notes: "pte" can be NULL if I do not know it apriori
> > >       I cannot release "mm->page_table_lock" otherwise I have to re-scan the "mm->pgd".
> > 
> > Re-schan plicy would be much better since migrating pages is heavy work.
> > I don't think that holding mm->page_table_lock for long time would be
> > good idea.
> 
> Re-scanning is "cache killer", at least on IA64 with huge user memory size.
> I have more than 512 Mbytes user memory and its PTEs do not fit into the L2 cache.
> 
> In my current design, I have the outer loops: PGD, PMD and PTE walking; and once
> I find a valid PTE, I check it against the list of max. 2048 physical addresses as
> the inner loop.
> I reversed them: walking through the list of max. 2048 physical addresses as outer
> loop and the PGD - PMD - PTE scans as inner loops resulted in 4 to 5 times slower
> migration.

Could you explain where you're getting these "magic numbers?"  I don't
quite understand the significance of 2048 physical addresses or 512 MB
of memory.

Zoltan, it appears that we have a bit of an inherent conflict with how
much CPU each of you is expecting to use in the removal and migration
cases.  You're coming from a HPC environment where each CPU cycle is
valuable, while the people trying to remove memory are probably going to
be taking CPUs offline soon anyway, and care a bit less about how
efficient they're being with CPU and cache resources.  

Could you be a bit more explicit about how expensive (cpu-wise) these
migrate operations can be?

-- Dave

next prev parent reply	other threads:[~2004-04-05 15:40 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-26  9:02 Migrate pages from a ccNUMA node to another Zoltan Menyhart
2004-03-26  9:02 ` Zoltan Menyhart
2004-03-26 10:39 ` Robin Holt
2004-03-26 10:39   ` Robin Holt
2004-03-26  7:10   ` Andi Kleen
2004-03-26  7:10     ` Andi Kleen
2004-03-26  9:18     ` Migrate pages from a ccNUMA node to another - patch Zoltan Menyhart
2004-03-26  9:18       ` Zoltan Menyhart
2004-03-26 17:20       ` Dave Hansen
2004-03-26 17:20         ` Dave Hansen
2004-03-30  8:27         ` IWAMOTO Toshihiro
2004-03-30  8:27           ` IWAMOTO Toshihiro
2004-03-30  9:05           ` Hirokazu Takahashi
2004-03-30  9:05             ` Hirokazu Takahashi
2004-03-30 11:20             ` Zoltan Menyhart
2004-03-30 11:20               ` Zoltan Menyhart
2004-03-30 12:08               ` Hirokazu Takahashi
2004-03-30 12:08                 ` Hirokazu Takahashi
2004-03-30 14:32                 ` Zoltan Menyhart
2004-03-30 14:32                   ` Zoltan Menyhart
2004-04-03  2:58                   ` Hirokazu Takahashi
2004-04-03  2:58                     ` Hirokazu Takahashi
2004-04-05 15:07                     ` Zoltan Menyhart
2004-04-05 15:07                       ` Zoltan Menyhart
2004-04-05 15:40                       ` Dave Hansen [this message]
2004-04-05 15:40                         ` Dave Hansen
2004-04-06 14:42                         ` Migrate pages from a ccNUMA node to another Zoltan Menyhart
2004-04-06 14:42                           ` Zoltan Menyhart
2004-04-08 13:32                       ` Migrate pages from a ccNUMA node to another - patch Hirokazu Takahashi
2004-04-08 13:32                         ` Hirokazu Takahashi
2004-03-30 11:39         ` Zoltan Menyhart
2004-03-30 11:39           ` Zoltan Menyhart
2004-03-30 15:18           ` Dave Hansen
2004-03-30 15:18             ` Dave Hansen
2004-03-30 15:58           ` Dave Hansen
2004-03-30 15:58             ` Dave Hansen
2004-03-30 16:37             ` Dave Hansen
2004-03-30 16:37               ` Dave Hansen
2004-04-01  8:44             ` Migrate pages from a ccNUMA node to another Zoltan Menyhart
2004-04-01  8:44               ` Zoltan Menyhart
2004-03-26 12:38   ` Zoltan Menyhart
2004-03-26 12:38     ` Zoltan Menyhart
2004-03-29 23:16 ` Erich Focht
2004-03-29 23:16   ` Erich Focht
2004-03-30  9:57   ` Zoltan Menyhart
2004-03-30  9:57     ` Zoltan Menyhart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1081179633.8956.2992.camel@nighthawk \
    --to=haveblue@us.ibm.com \
    --cc=Zoltan.Menyhart@bull.net \
    --cc=iwamoto@valinux.co.jp \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=taka@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.