Linux CXL
 help / color / mirror / Atom feed
From: Gregory Price <gourry.memverge@gmail.com>
To: linux-mm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-api@vger.kernel.org, linux-cxl@vger.kernel.org,
	luto@kernel.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	arnd@arndb.de, akpm@linux-foundation.org, x86@kernel.org,
	Gregory Price <gregory.price@memverge.com>
Subject: [RFC] man/move_phys_pages: migrate pages based on physical address
Date: Tue, 19 Sep 2023 19:09:09 -0400	[thread overview]
Message-ID: <20230919230909.530174-7-gregory.price@memverge.com> (raw)
In-Reply-To: <20230919230909.530174-1-gregory.price@memverge.com>

Draft of the move_phys_pages syscall proposed in RFC:

https://lore.kernel.org/all/20230907075453.350554-1-gregory.price@memverge.com/

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 man2/move_phys_pages.2 | 180 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 180 insertions(+)
 create mode 100644 man2/move_phys_pages.2

diff --git a/man2/move_phys_pages.2 b/man2/move_phys_pages.2
new file mode 100644
index 000000000..4f4b68915
--- /dev/null
+++ b/man2/move_phys_pages.2
@@ -0,0 +1,180 @@
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft-2-para
+.\"
+.\" This manpage is Copyright (C) 2006 Silicon Graphics, Inc.
+.\"                               Christoph Lameter
+.\" This manpage is Copyright (C) 2023 MemVerge, Inc.
+.\"                               Gregory Price
+.\"
+.\"
+.TH move_phys_pages 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+move_phys_pages \- move individual physically-addressed pages to another node
+.SH LIBRARY
+NUMA (Non-Uniform Memory Access) policy library
+.RI ( libnuma ", " \-lnuma )
+.SH SYNOPSIS
+.nf
+.B #include <numaif.h>
+.PP
+.BI "long move_phys_pages(unsigned long " count ", \
+uint64_t *" pages [. count ],
+.BI "                const int " nodes [. count "], int " status [. count "], \
+int " flags );
+.fi
+.SH DESCRIPTION
+.BR move_phys_pages ()
+moves the specified
+.I physical pages
+to the memory nodes specified by
+.IR nodes .
+The result of the move is reflected in
+.IR status .
+The
+.I flags
+indicate constraints on the pages to be moved.
+.PP
+This interface requires
+.RB ( CAP_SYS_ADMIN ) .
+.PP
+.I count
+is the number of pages to move.
+It defines the size of the three arrays
+.IR pages ,
+.IR nodes ,
+and
+.IR status .
+.PP
+.I pages
+is an array of physical addresses to the pages that should be moved.
+These are addresses that should be aligned to page boundaries.
+.PP
+.I nodes
+is an array of integers that specify the desired location for each page.
+Each element in the array is a node number.
+.I nodes
+can also be NULL, in which case
+.BR move_phys_pages ()
+does not move any pages but instead will return the node
+where each page currently resides, in the
+.I status
+array.
+Obtaining the status of each page may be necessary to determine
+pages that need to be moved.
+.PP
+.I status
+is an array of integers that return the status of each page.
+The array contains valid values only if
+.BR move_phys_pages ()
+did not return an error.
+Preinitialization of the array to a value
+which cannot represent a real numa node or valid error of status array
+could help to identify pages that have been migrated if a partial
+failure occurs.
+.PP
+.I flags
+specify what types of pages to move.
+.B MPOL_MF_MOVE
+means that only pages that are in exclusive use by a process
+are to be moved.
+.B MPOL_MF_MOVE_ALL
+means that pages shared between multiple processes can also be moved.
+.SS Page states in the status array
+The following values can be returned in each element of the
+.I status
+array.
+.TP
+.B 0..MAX_NUMNODES
+Identifies the node on which the page resides.
+.TP
+.B \-EACCES
+The target node for the page is not in the insectional set of allowed
+nodes defined by all tasks mapping the address. At least one task
+mapping the address does not allow memory the target node.
+.TP
+.B \-EBUSY
+The page is currently busy and cannot be moved.
+Try again later.
+This occurs if a page is undergoing I/O or another kernel subsystem
+is holding a reference to the page.
+.TP
+.B \-EFAULT
+This is a zero page, the memory area is not mapped by the process,
+or the memory is not migratable.
+.TP
+.B \-EIO
+Unable to write back a page.
+The page has to be written back
+in order to move it since the page is dirty and the filesystem
+does not provide a migration function that would allow the move
+of dirty pages.
+.TP
+.B \-EINVAL
+A dirty page cannot be moved.
+The filesystem does not
+provide a migration function and has no ability to write back pages.
+
+.TP
+.B \-ENOENT
+The physical page is not online or the page is not present in any VMA.
+.TP
+.B \-ENOMEM
+Unable to allocate memory on target node.
+.SH RETURN VALUE
+On success
+.BR move_phys_pages ()
+returns zero.
+.\" FIXME . Is the following quite true: does the wrapper in numactl
+.\" do the right thing?
+On error, it returns \-1, and sets
+.I errno
+to indicate the error.
+If positive value is returned, it is the number of
+nonmigrated pages.
+.SH ERRORS
+.TP
+.B Positive value
+The number of nonmigrated pages if they were the result of nonfatal
+reasons.
+.TP
+.B EFAULT
+Parameter array could not be accessed.
+.TP
+.B EINVAL
+The flag value was not 0 (Linux 6.6), or an attempt was made to
+migrate pages of a kernel thread.
+.TP
+.B ENODEV
+One of the target nodes is not online.
+.TP
+.B EPERM
+The caller specified has insufficient privileges
+.RB ( CAP_SYS_ADMIN ).
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux X.Y.Z
+.SH NOTES
+For information on library support, see
+.BR numa (7).
+.PP
+Use of this function may result in pages whose location
+(node) violates the memory policy established for the
+specified addresses (See
+.BR mbind (2))
+and/or the specified process (See
+.BR set_mempolicy (2)).
+That is, memory policy does not constrain the destination
+nodes used by
+.BR move_phys_pages ().
+.PP
+The
+.I <numaif.h>
+header is not included with glibc, but requires installing
+.I libnuma\-devel
+or a similar package.
+.SH SEE ALSO
+.BR mbind (2),
+.BR numa (3),
+.BR numa_maps (5),
+.BR cpuset (7),
+.BR numa (7),
+.BR migratepages (8),
+.BR numastat (8)
-- 
2.34.1


      parent reply	other threads:[~2023-09-19 23:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-19 23:09 [RFC v2 0/5] move_phys_pages syscall Gregory Price
2023-09-19 23:09 ` [RFC v2 1/5] mm/migrate: fix do_pages_move for compat pointers Gregory Price
2023-09-20  9:36   ` Arnd Bergmann
2023-09-19 23:09 ` [RFC v2 2/5] mm/migrate: remove unused mm argument from do_move_pages_to_node Gregory Price
2023-10-02 13:44   ` Jonathan Cameron
2023-09-19 23:09 ` [RFC v2 3/5] mm/migrate: refactor add_page_for_migration for code re-use Gregory Price
2023-10-02 13:51   ` Jonathan Cameron
2023-09-19 23:09 ` [RFC v2 4/5] mm/migrate: Create move_phys_pages syscall Gregory Price
2023-10-02 14:07   ` Jonathan Cameron
2023-10-03 17:58     ` Gregory Price
2023-09-19 23:09 ` [RFC v2 5/5] ktest: sys_move_phys_pages ktest Gregory Price
2023-10-02 14:09   ` Jonathan Cameron
2023-09-19 23:09 ` Gregory Price [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230919230909.530174-7-gregory.price@memverge.com \
    --to=gourry.memverge@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregory.price@memverge.com \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox