public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ray Bryant <raybry@sgi.com>
To: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>, Ray Bryant <raybry@austin.rr.com>,
	linux-mm <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II
Date: Fri, 18 Feb 2005 10:58:59 -0600	[thread overview]
Message-ID: <42161ED3.1050400@sgi.com> (raw)
In-Reply-To: <20050217235437.GA31591@wotan.suse.de>

Here's an interface proposal that may be a middle ground and
should satisfy both small and large system requirements:

The system call interface would be:

page_migrate(pid, va_start, va_end, count, old_node_list, new_node_list);

(e. g. same as before, but please keep reading....):

The following restrictions of my original proposal would be
dropped:

(1)  va_start and va_end can span multiple vma's.  To migrate
      all pages in a process, va_start can be 0UL and va_end
      would be MAX_INT L.  (Equivalently, we could use va_start
      and length, in pages....)  We would expect the normal usage
      of this call on small systems to be va_start=0, va_end=MAX_INT.
      va_start and va_end would be required to be page aligned.

(2)  There is no requirement that the pid be suspended before
      the system call is issued.  Further requirements below
      are proposed to handle the allocation of new pages while
      the migrate system call is in progress.

(3)  Mempolicy data structures will be updated to reflect the
      new node locations before any pages are migrated.  That
      way, if the process allocates new pages before the migration
      process is completed, they will be allocated on the new
      nodes.

      (An alternative:  we could require the user to update
      the NUMA API data structures to reflect the new reality
      before the page_migrate() call is issued.  This is consistent
      with item (4).  If the user doesn't do this, then
      there is no guarentee that the page migration call will
      actually be able to migrate all pages.)

      If any memory policy is DEFAULT, then the pid will need to
      be migrated to a cpu associated with  one of the new_node_list
      nodes before the page_migrate() call.  This is so new
      allocations will happen in the new_node_list and the
      migration call won't miss those pages.  The system call
      will work correctly without this, it just can't guarentee
      that it will migrate all pages from the old_nodes.

(4)  If cpusets are in use, the new_node_list must represent
      valid nodes to allocate pages from for the cpuset that
      pid is currently a member of.  This implies that the
      pid is moved from its old cpuset to a new cpuset before
      the page_migrate() call is issued.  Any nodes not part
      of the new cpu set will cause the system call to return
      with -EINVAL.

(5)  If, during the migration process, a page is to be moved to
      node N, but the alloc_pages_node() call for node N fails, then the
      page will fall over to allocation on the "nearest" node
      in the new_node_list; if this node is full then fall over
      to the next nearest node, etc.  If none of the nodes has
      space, then the migration system call will fail.  (Hmmm...
      would we unmigrate the pages that had been migrated
      this far??  sounds messy.... also, not sure what one
      would do about error reporting here so that the caller
      could take some corrective action.)

(6)  The system call is reserved to root or a pid with
      capability CAP_PAGE_MIGRATE.

(7)  Mapped files with the extended attribute MIGRATE
      set to NONE are not migrated by the system call.
      Mapped files with the extended attribute MIGRATE
      set to LIB will be handled as follows:  r/o
      mappings will not be migrated.  r/w mappings will
      be migrated.  If no MIGRATE extended attribute is available,
      then the assumtion is that the MIGRATE extended
      attribute is not set.  (Files mapped from NFS
      would always be regarded as migrateable until
      NFS gets extended attributes.)

Note that nothing here requires parsing of /proc/pid/maps,
etc.  However, very large systems may use the system call
in special ways, e. g:

(1)  They may decide to suspend processes before migration.
(2)  They may decide to optimize the migration process by
      trying to migrate large shared objects only "once",
      in the sense that only one scan of a large shared
      object will be done.

Issues of complexity related to the above are reserved for
those systems who choose to use the system call in this way.

Please note, however that this is a performance optimization
that some systems MAY decide to do.  There is NO REQUIREMENT
that any user follow these steps from a correctness point of
view, the page_migrate() system call will still do the correct
thing.

Now, I know that is complicated and lot of verbage.  But this
would satisfy our requirements and I think it would satisfy
the concern that the page_migration() call was built just to
satisfy SGI requirements.

Comments, flames, suggestions, etc, as usual are all welcome.
-- 
-----------------------------------------------
Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
	 so I installed Linux.
-----------------------------------------------


  parent reply	other threads:[~2005-02-18 17:09 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-12  3:25 [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview Ray Bryant
2005-02-12  3:25 ` [RFC 2.6.11-rc2-mm2 1/7] mm: manual page migration -- cleanup 1 Ray Bryant
2005-02-12  3:25 ` [RFC 2.6.11-rc2-mm2 2/7] mm: manual page migration -- cleanup 2 Ray Bryant
2005-02-12  3:25 ` [RFC 2.6.11-rc2-mm2 3/7] mm: manual page migration -- cleanup 3 Ray Bryant
2005-02-12  3:26 ` [RFC 2.6.11-rc2-mm2 4/7] mm: manual page migration -- cleanup 4 Ray Bryant
2005-02-12  3:26 ` [RFC 2.6.11-rc2-mm2 5/7] mm: manual page migration -- cleanup 5 Ray Bryant
2005-02-12  3:26 ` [RFC 2.6.11-rc2-mm2 6/7] mm: manual page migration -- add node_map arg to try_to_migrate_pages() Ray Bryant
2005-02-12  3:26 ` [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate Ray Bryant
2005-02-12  8:08   ` Paul Jackson
2005-02-12 12:34   ` Arjan van de Ven
2005-02-12 14:48     ` Andi Kleen
2005-02-12 20:51       ` Paul Jackson
2005-02-12 21:04   ` Dave Hansen
2005-02-12 21:44     ` Paul Jackson
2005-02-14 13:52     ` Robin Holt
2005-02-14 18:50       ` Dave Hansen
2005-02-14 22:01         ` Robin Holt
2005-02-14 22:22           ` Dave Hansen
2005-02-15 10:50             ` Robin Holt
2005-02-15 15:38               ` Paul Jackson
2005-02-15 18:39               ` Dave Hansen
2005-02-15 18:54                 ` Ray Bryant
2005-02-15 15:49           ` Paul Jackson
2005-02-15 16:21             ` Robin Holt
2005-02-15 16:35               ` Paul Jackson
2005-02-15 18:59                 ` Robin Holt
2005-02-15 20:54                   ` Dave Hansen
     [not found]                   ` <16914.28795.316835.291470@wombat.chubb.wattle.id.au>
2005-02-15 22:10                     ` Paul Jackson
2005-02-15 22:51                     ` Robin Holt
2005-02-15 23:00                       ` Paul Jackson
2005-02-15 15:40         ` Paul Jackson
2005-02-12 11:17 ` [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview Andi Kleen
2005-02-12 12:12   ` Robin Holt
2005-02-14 19:18     ` Andi Kleen
2005-02-15  1:02       ` Steve Longerbeam
2005-02-12 15:54   ` Marcelo Tosatti
2005-02-12 16:18     ` Marcelo Tosatti
2005-02-12 21:29     ` Andi Kleen
2005-02-14 16:38       ` Robin Holt
2005-02-14 19:15         ` Andi Kleen
2005-02-14 23:49           ` Ray Bryant
2005-02-15  3:16             ` Paul Jackson
2005-02-15  9:14               ` Ray Bryant
2005-02-15 15:21                 ` Paul Jackson
2005-02-15  0:29   ` Ray Bryant
2005-02-15 11:05     ` Robin Holt
2005-02-15 17:44       ` Ray Bryant
2005-02-15 11:53     ` Andi Kleen
2005-02-15 12:15       ` Robin Holt
2005-02-15 15:07         ` Paul Jackson
2005-02-15 15:11         ` Paul Jackson
2005-02-15 18:16       ` Ray Bryant
2005-02-15 18:24         ` Andi Kleen
2005-02-15 12:14     ` [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II Andi Kleen
2005-02-15 18:38       ` Ray Bryant
2005-02-15 21:48         ` Andi Kleen
2005-02-15 22:37           ` Paul Jackson
2005-02-16  3:44           ` Ray Bryant
2005-02-17 23:54             ` Andi Kleen
2005-02-18  8:38               ` Ray Bryant
2005-02-18 13:02                 ` Andi Kleen
2005-02-18 16:18                   ` Paul Jackson
2005-02-18 16:20                   ` Paul Jackson
2005-02-18 16:22                   ` Paul Jackson
2005-02-18 16:25                   ` Paul Jackson
2005-02-19  1:01                   ` Ray Bryant
2005-02-20 21:49                     ` Andi Kleen
2005-02-20 22:30                       ` Paul Jackson
2005-02-20 22:35                         ` Andi Kleen
2005-02-21  1:50                           ` Paul Jackson
2005-02-21  7:39                             ` Ray Bryant
2005-02-21  7:29                           ` Ray Bryant
2005-02-21  9:57                             ` Andi Kleen
2005-02-21 12:02                               ` Paul Jackson
2005-02-21  8:42                           ` Ray Bryant
2005-02-21 12:10                             ` Andi Kleen
2005-02-21 17:12                               ` Ray Bryant
2005-02-22 18:03                                 ` Andi Kleen
2005-02-22  6:40                               ` Ray Bryant
2005-02-22 18:01                                 ` Andi Kleen
2005-02-22 18:45                                   ` Ray Bryant
2005-02-22 18:49                                     ` Andi Kleen
2005-02-22 22:04                                   ` Ray Bryant
2005-02-22  6:44                               ` Ray Bryant
2005-02-21  4:20                       ` Ray Bryant
2005-02-18 16:58               ` Ray Bryant [this message]
2005-02-18 17:02               ` Ray Bryant
2005-02-18 17:11               ` Ray Bryant

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42161ED3.1050400@sgi.com \
    --to=raybry@sgi.com \
    --cc=ak@muc.de \
    --cc=ak@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=raybry@austin.rr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox