Re: Migrate pages from a ccNUMA node to another

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Zoltan Menyhart <Zoltan.Menyhart_AT_bull.net@nospam.org>
To: Robin Holt in <holt@sgi.com>
Cc: linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Migrate pages from a ccNUMA node to another
Date: Fri, 26 Mar 2004 12:38:52 +0000	[thread overview]
Message-ID: <4064245C.50B74C67@nospam.org> (raw)
In-Reply-To: 20040326103959.GB14360@lnx-holt

Robin Holt wrote:
> 
> We have found that "automatic" migration ends to result in the
> system deciding to move the wrong pieces around.  Since applications
> can be so varied, I would recommend we let the application decide
> when it thinks it is beneficial to move a memory range to a nearby
> node.

I am not saying it is for every application
(see the paragraph of the "if's").
There are a couple of applications which run for long time, with
relatively stable memory working sets. And I can help them.
You launch your application with and without, and you use if you
gain enough.

> The placement policy doesn't really fit the bill entirely.  We are
> currently tracking a problem with repeatability of a benchmark.  We
> found that the newer libc we are using used to result in a newly
> forked process touching a page before the parent did and therefore
> the page, which had been marked COW, would, on the old libc end up
> on the childs node for the child and parents node for the parent.
> After the update, both pages ended up on the parents.

I haven't modified anything in the existing page fault handler.
Nor I've changed the placement policy.
You need to specify explicitly where the pages go for my proposed
syscall.

> If you syscall would simply do the copy to the destination node
> for COW pages, this would have worked terrifically in both cases.

The COW pages are referenced by more than one PGDs (by that of the
parent and its children). As I state in RESTRICTIONS, I skip these
pages.

I think this issue with the COW pages is a fork() - exec()
placement problem, i do not address it with my stuff.

> >
> > 3. NUMA aware scheduler
> > .......................
> >
> 
> Back to my earlier comment about magic.  This is a second tier of
> magic.  Here we are talking about infering a reason to migrate based
> on memory access patterns, but what if that migration results in
> some other process being hurt more than this one is helped.
> 
> Honestly, we have beaten on the scheduler quite a bit and the "allocate
> memory close to my node" has helped considerably.
> 
> One thing that would probably help considerably, in addition to the
> syscall you seem to be proposing, would be an addition to the
> task_struct.  The new field would specify which node to attempt
> allocations on.  Before doing a fork, the parent would do a
> syscall to set this field to the node the child will target.  It
> would then call fork.  The PGDs et al and associated memory, including
> the task struct and pages would end up being allocated based upon
> that numa node's allocation preference.
> 
> What do you think of combining these two items into a single syscall?

I can agree with Robin Holt, it's NUMA API issue.
I just give a tool, if someone somehow knows that this piece of memory
would be better on another node, I can do it.

> > NAME
> >         migrate_ph_pages        - migrate pages to another NUMA node
> 
> At first, I thought "Wow, this could result in some nice admin tools."
> The more I scratch my head on this, the less useful I see it, but
> would not argue against it.

We are working on the prototype of a device driver to read out the
"hot page" counters on n-th Scalable Node Controller
(say: "/dev/snc/n/hotpage").
An "artificial intelligence" can guess what to move and calls this service.

BTW Has someone a machine with a chip set other than i82870 ?

Thanks,

Zoltan Menyhart

WARNING: multiple messages have this Message-ID (diff)

From: Zoltan Menyhart <Zoltan.Menyhart_AT_bull.net@nospam.org>
To: Robin Holt in <holt@sgi.com>
Cc: linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Migrate pages from a ccNUMA node to another
Date: Fri, 26 Mar 2004 13:38:52 +0100	[thread overview]
Message-ID: <4064245C.50B74C67@nospam.org> (raw)
In-Reply-To: 20040326103959.GB14360@lnx-holt

Robin Holt wrote:
> 
> We have found that "automatic" migration ends to result in the
> system deciding to move the wrong pieces around.  Since applications
> can be so varied, I would recommend we let the application decide
> when it thinks it is beneficial to move a memory range to a nearby
> node.

I am not saying it is for every application
(see the paragraph of the "if's").
There are a couple of applications which run for long time, with
relatively stable memory working sets. And I can help them.
You launch your application with and without, and you use if you
gain enough.

> The placement policy doesn't really fit the bill entirely.  We are
> currently tracking a problem with repeatability of a benchmark.  We
> found that the newer libc we are using used to result in a newly
> forked process touching a page before the parent did and therefore
> the page, which had been marked COW, would, on the old libc end up
> on the childs node for the child and parents node for the parent.
> After the update, both pages ended up on the parents.

I haven't modified anything in the existing page fault handler.
Nor I've changed the placement policy.
You need to specify explicitly where the pages go for my proposed
syscall.

> If you syscall would simply do the copy to the destination node
> for COW pages, this would have worked terrifically in both cases.

The COW pages are referenced by more than one PGDs (by that of the
parent and its children). As I state in RESTRICTIONS, I skip these
pages.

I think this issue with the COW pages is a fork() - exec()
placement problem, i do not address it with my stuff.

> >
> > 3. NUMA aware scheduler
> > .......................
> >
> 
> Back to my earlier comment about magic.  This is a second tier of
> magic.  Here we are talking about infering a reason to migrate based
> on memory access patterns, but what if that migration results in
> some other process being hurt more than this one is helped.
> 
> Honestly, we have beaten on the scheduler quite a bit and the "allocate
> memory close to my node" has helped considerably.
> 
> One thing that would probably help considerably, in addition to the
> syscall you seem to be proposing, would be an addition to the
> task_struct.  The new field would specify which node to attempt
> allocations on.  Before doing a fork, the parent would do a
> syscall to set this field to the node the child will target.  It
> would then call fork.  The PGDs et al and associated memory, including
> the task struct and pages would end up being allocated based upon
> that numa node's allocation preference.
> 
> What do you think of combining these two items into a single syscall?

I can agree with Robin Holt, it's NUMA API issue.
I just give a tool, if someone somehow knows that this piece of memory
would be better on another node, I can do it.

> > NAME
> >         migrate_ph_pages        - migrate pages to another NUMA node
> 
> At first, I thought "Wow, this could result in some nice admin tools."
> The more I scratch my head on this, the less useful I see it, but
> would not argue against it.

We are working on the prototype of a device driver to read out the
"hot page" counters on n-th Scalable Node Controller
(say: "/dev/snc/n/hotpage").
An "artificial intelligence" can guess what to move and calls this service.

BTW Has someone a machine with a chip set other than i82870 ?

Thanks,

Zoltan Menyhart

next prev parent reply	other threads:[~2004-03-26 12:38 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-26  9:02 Migrate pages from a ccNUMA node to another Zoltan Menyhart
2004-03-26  9:02 ` Zoltan Menyhart
2004-03-26 10:39 ` Robin Holt
2004-03-26 10:39   ` Robin Holt
2004-03-26  7:10   ` Andi Kleen
2004-03-26  7:10     ` Andi Kleen
2004-03-26  9:18     ` Migrate pages from a ccNUMA node to another - patch Zoltan Menyhart
2004-03-26  9:18       ` Zoltan Menyhart
2004-03-26 17:20       ` Dave Hansen
2004-03-26 17:20         ` Dave Hansen
2004-03-30  8:27         ` IWAMOTO Toshihiro
2004-03-30  8:27           ` IWAMOTO Toshihiro
2004-03-30  9:05           ` Hirokazu Takahashi
2004-03-30  9:05             ` Hirokazu Takahashi
2004-03-30 11:20             ` Zoltan Menyhart
2004-03-30 11:20               ` Zoltan Menyhart
2004-03-30 12:08               ` Hirokazu Takahashi
2004-03-30 12:08                 ` Hirokazu Takahashi
2004-03-30 14:32                 ` Zoltan Menyhart
2004-03-30 14:32                   ` Zoltan Menyhart
2004-04-03  2:58                   ` Hirokazu Takahashi
2004-04-03  2:58                     ` Hirokazu Takahashi
2004-04-05 15:07                     ` Zoltan Menyhart
2004-04-05 15:07                       ` Zoltan Menyhart
2004-04-05 15:40                       ` Dave Hansen
2004-04-05 15:40                         ` Dave Hansen
2004-04-06 14:42                         ` Migrate pages from a ccNUMA node to another Zoltan Menyhart
2004-04-06 14:42                           ` Zoltan Menyhart
2004-04-08 13:32                       ` Migrate pages from a ccNUMA node to another - patch Hirokazu Takahashi
2004-04-08 13:32                         ` Hirokazu Takahashi
2004-03-30 11:39         ` Zoltan Menyhart
2004-03-30 11:39           ` Zoltan Menyhart
2004-03-30 15:18           ` Dave Hansen
2004-03-30 15:18             ` Dave Hansen
2004-03-30 15:58           ` Dave Hansen
2004-03-30 15:58             ` Dave Hansen
2004-03-30 16:37             ` Dave Hansen
2004-03-30 16:37               ` Dave Hansen
2004-04-01  8:44             ` Migrate pages from a ccNUMA node to another Zoltan Menyhart
2004-04-01  8:44               ` Zoltan Menyhart
2004-03-26 12:38   ` Zoltan Menyhart [this message]
2004-03-26 12:38     ` Zoltan Menyhart
2004-03-29 23:16 ` Erich Focht
2004-03-29 23:16   ` Erich Focht
2004-03-30  9:57   ` Zoltan Menyhart
2004-03-30  9:57     ` Zoltan Menyhart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4064245C.50B74C67@nospam.org \
    --to=zoltan.menyhart_at_bull.net@nospam.org \
    --cc=holt@sgi.com \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.