From: Michael Kerrisk <mtk-manpages@gmx.net>
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: clameter@sgi.com, akpm@linux-foundation.org, linux-mm@kvack.org,
ak@suse.de, Eric Whitney <eric.whitney@hp.com>
Subject: Re: [PATCH] Mempolicy Man Pages 2.64 1/3 - mbind.2
Date: Mon, 27 Aug 2007 13:29:02 +0200 [thread overview]
Message-ID: <46D2B57E.30800@gmx.net> (raw)
In-Reply-To: <1187798903.5166.12.camel@localhost>
Lee,
Thanks for the tremendous work on these pages!
I've applied the patch below, but added the following copyright line to this (and also to the other two pages):
.\" and Copyright 2007 Lee Schermerhorn, Hewlett Packard
Let me know if this okay/should be changed in any way.
This patch (and the others) applied for man-pages-2.65.
Cheers,
Michael
Lee Schermerhorn wrote:
> Michael:
>
> I've separated the mempolicy man page updates into 3 separate patches,
> against the 2.64 man pages. I've added a slightly less terse
> description of the changes for the change log.
>
> Here's the first of the 3--mbind.2. I updated the description of the
> interaction with MAP_SHARED to the wording you suggested. a while back.
>
> ---------------------------------
>
> [PATCH] Mempolicy Man Pages 2.64 1/3 - mbind.2
>
> Against: man pages 2.64
>
> Changes:
>
> + changed the "policy" parameter to "mode" through out the
> descriptions in an attempt to promote the concept that the memory
> policy is a tuple consisting of a mode and optional set of nodes.
>
> + rewrite portions of description for clarification.
>
> ++ clarify interaction of policy with mmap()'d files and shared
> memory regions, including SHM_HUGE regions.
>
> ++ defined how "empty set of nodes" specified and what this
> means for MPOL_PREFERRED.
>
> ++ mention what happens if local/target node contains no
> free memory.
>
> ++ clarify semantics of multiple nodes to BIND policy.
> Note: subject to change. We'll fix the man pages when/if
> this happens.
>
> + added all errors currently returned by sys call.
>
> + added mmap(2), shmget(2), shmat(2) to See Also list.
>
>
>
> man2/mbind.2 | 338 +++++++++++++++++++++++++++++++++++++++++++----------------
> 1 file changed, 248 insertions(+), 90 deletions(-)
>
> Index: Linux/man2/mbind.2
> ===================================================================
> --- Linux.orig/man2/mbind.2 2007-08-22 11:22:00.000000000 -0400
> +++ Linux/man2/mbind.2 2007-08-22 11:56:58.000000000 -0400
> @@ -18,15 +18,16 @@
> .\" the source, must acknowledge the copyright and authors of this work.
> .\"
> .\" 2006-02-03, mtk, substantial wording changes and other improvements
> +.\" 2007-06-01, lts, more precise specification of behavior.
> .\"
> -.TH MBIND 2 2006-02-07 "Linux" "Linux Programmer's Manual"
> +.TH MBIND 2 "2007-06-01" "SuSE Labs" "Linux Programmer's Manual"
> .SH NAME
> mbind \- Set memory policy for a memory range
> .SH SYNOPSIS
> .nf
> .B "#include <numaif.h>"
> .sp
> -.BI "int mbind(void *" start ", unsigned long " len ", int " policy ,
> +.BI "int mbind(void *" start ", unsigned long " len ", int " mode ,
> .BI " unsigned long *" nodemask ", unsigned long " maxnode ,
> .BI " unsigned " flags );
> .sp
> @@ -34,76 +35,178 @@ mbind \- Set memory policy for a memory
> .fi
> .SH DESCRIPTION
> .BR mbind ()
> -sets the NUMA memory
> -.I policy
> +sets the NUMA memory policy,
> +which consists of a policy mode and zero or more nodes,
> for the memory range starting with
> .I start
> and continuing for
> .IR len
> bytes.
> The memory of a NUMA machine is divided into multiple nodes.
> -The memory policy defines in which node memory is allocated.
> +The memory policy defines from which node memory is allocated.
> +
> +If the memory range specified by the
> +.IR start " and " len
> +arguments includes an "anonymous" region of memory\(emthat is
> +a region of memory created using the
> +.BR mmap (2)
> +system call with the
> +.BR MAP_ANONYMOUS \(emor
> +a memory mapped file, mapped using the
> +.BR mmap (2)
> +system call with the
> +.B MAP_PRIVATE
> +flag, pages will only be allocated according to the specified
> +policy when the application writes [stores] to the page.
> +For anonymous regions, an initial read access will use a shared
> +page in the kernel containing all zeros.
> +For a file mapped with
> +.BR MAP_PRIVATE ,
> +an initial read access will allocate pages according to the
> +process policy of the process that causes the page to be allocated.
> +This may not be the process that called
> +.BR mbind ().
> +
> +The specified policy will be ignored for any
> +.B MAP_SHARED
> +mappings in the specified memory range.
> +Rather the pages will be allocated according to the process policy
> +of the process that caused the page to be allocated.
> +Again, this may not be the process that called
> +.BR mbind ().
> +
> +If the specified memory range includes a shared memory region
> +created using the
> +.BR shmget (2)
> +system call and attached using the
> +.BR shmat (2)
> +system call,
> +pages allocated for the anonymous or shared memory region will
> +be allocated according to the policy specified, regardless which
> +process attached to the shared memory segment causes the allocation.
> +If, however, the shared memory region was created with the
> +.B SHM_HUGETLB
> +flag,
> +the huge pages will be allocated according to the policy specified
> +only if the page allocation is caused by the task that calls
> +.BR mbind ()
> +for that region.
> +
> +By default,
> .BR mbind ()
> only has an effect for new allocations; if the pages inside
> the range have been already touched before setting the policy,
> then the policy has no effect.
> +This default behavior may be overridden by the
> +.BR MPOL_MF_MOVE
> +and
> +.B MPOL_MF_MOVE_ALL
> +flags described below.
>
> -Available policies are
> +The
> +.I mode
> +argument must specify one of
> .BR MPOL_DEFAULT ,
> .BR MPOL_BIND ,
> -.BR MPOL_INTERLEAVE ,
> -and
> +.B MPOL_INTERLEAVE
> +or
> .BR MPOL_PREFERRED .
> -All policies except
> +All policy modes except
> .B MPOL_DEFAULT
> -require the caller to specify the nodes to which the policy applies in the
> +require the caller to specify via the
> .I nodemask
> -parameter.
> +parameter,
> +the node or nodes to which the mode applies.
> +
> .I nodemask
> -is a bit mask of nodes containing up to
> +points to a bitmask of nodes containing up to
> .I maxnode
> bits.
> -The actual number of bytes transferred via this argument
> -is rounded up to the next multiple of
> +The bit mask size is rounded to the next multiple of
> .IR "sizeof(unsigned long)" ,
> but the kernel will only use bits up to
> .IR maxnode .
> -A NULL argument means an empty set of nodes.
> +A NULL value of
> +.I nodemask
> +or a
> +.I maxnode
> +value of zero specifies the empty set of nodes.
> +If the value of
> +.I maxnode
> +is zero,
> +the
> +.I nodemask
> +argument is ignored.
>
> The
> .B MPOL_DEFAULT
> -policy is the default and means to use the underlying process policy
> -(which can be modified with
> -.BR set_mempolicy (2)).
> -Unless the process policy has been changed this means to allocate
> -memory on the node of the CPU that triggered the allocation.
> +mode specifies that the default policy be used.
> +When applied to a range of memory via
> +.IR mbind (),
> +this means to use the process policy,
> + which may have been set with
> +.BR set_mempolicy (2).
> +If the mode of the process policy is also
> +.BR MPOL_DEFAULT ,
> +the system-wide default policy will be used.
> +The system-wide default policy will allocate
> +pages on the node of the CPU that triggers the allocation.
> +For
> +.BR MPOL_DEFAULT ,
> +the
> .I nodemask
> -should be specified as NULL.
> +and
> +.I maxnode
> +arguments must be specify the empty set of nodes.
>
> The
> .B MPOL_BIND
> -policy is a strict policy that restricts memory allocation to the
> -nodes specified in
> +mode specifies a strict policy that restricts memory allocation to
> +the nodes specified in
> +.IR nodemask .
> +If
> +.I nodemask
> +specifies more than one node, page allocations will come from
> +the node with the lowest numeric node id first, until that node
> +contains no free memory.
> +Allocations will then come from the node with the next highest
> +node id specified in
> +.I nodemask
> +and so forth, until none of the specified nodes contain free memory.
> +Pages will not be allocated from any node not specified in the
> .IR nodemask .
> -There won't be allocations on other nodes.
>
> +The
> .B MPOL_INTERLEAVE
> -interleaves allocations to the nodes specified in
> +mode specifies that page allocations be interleaved across the
> +set of nodes specified in
> .IR nodemask .
> -This optimizes for bandwidth instead of latency.
> +This optimizes for bandwidth instead of latency
> +by spreading out pages and memory accesses to those pages across
> +multiple nodes.
> To be effective the memory area should be fairly large,
> -at least 1MB or bigger.
> +at least 1MB or bigger with a fairly uniform access pattern.
> +Accesses to a single page of the area will still be limited to
> +the memory bandwidth of a single node.
>
> .B MPOL_PREFERRED
> sets the preferred node for allocation.
> -The kernel will try to allocate in this
> +The kernel will try to allocate pages from this
> node first and fall back to other nodes if the
> preferred nodes is low on free memory.
> -Only the first node in the
> +If
> +.I nodemask
> +specifies more than one node id, the first node in the
> +mask will be selected as the preferred node.
> +If the
> .I nodemask
> -is used.
> -If no node is set in the mask, then the memory is allocated on
> -the node of the CPU that triggered the allocation allocation).
> +and
> +.I maxnode
> +arguments specify the empty set, then the memory is allocated on
> +the node of the CPU that triggered the allocation.
> +This is the only way to specify "local allocation" for a
> +range of memory via
> +.IR mbind (2).
>
> If
> .B MPOL_MF_STRICT
> @@ -115,17 +218,18 @@ is not
> .BR MPOL_DEFAULT ,
> then the call will fail with the error
> .B EIO
> -if the existing pages in the mapping don't follow the policy.
> -In 2.6.16 or later the kernel will also try to move pages
> -to the requested node with this flag.
> +if the existing pages in the memory range don't follow the policy.
> +.\" According to the kernel code, the following is not true --lts
> +.\" In 2.6.16 or later the kernel will also try to move pages
> +.\" to the requested node with this flag.
>
> If
> .B MPOL_MF_MOVE
> -is passed in
> +is specified in
> .IR flags ,
> -then an attempt will be made to
> -move all the pages in the mapping so that they follow the policy.
> -Pages that are shared with other processes are not moved.
> +then the kernel will attempt to move all the existing pages
> +in the memory range so that they follow the policy.
> +Pages that are shared with other processes will not be moved.
> If
> .B MPOL_MF_STRICT
> is also specified, then the call will fail with the error
> @@ -136,8 +240,8 @@ If
> .B MPOL_MF_MOVE_ALL
> is passed in
> .IR flags ,
> -then all pages in the mapping will be moved regardless of whether
> -other processes use the pages.
> +then the kernel will attempt to move all existing pages in the memory range
> +regardless of whether other processes use the pages.
> The calling process must be privileged
> .RB ( CAP_SYS_NICE )
> to use this flag.
> @@ -146,6 +250,7 @@ If
> is also specified, then the call will fail with the error
> .B EIO
> if some pages could not be moved.
> +.\" ---------------------------------------------------------------
> .SH RETURN VALUE
> On success,
> .BR mbind ()
> @@ -153,11 +258,9 @@ returns 0;
> on error, \-1 is returned and
> .I errno
> is set to indicate the error.
> +.\" ---------------------------------------------------------------
> .SH ERRORS
> -.TP
> -.B EFAULT
> -There was a unmapped hole in the specified memory range
> -or a passed pointer was not valid.
> +.\" I think I got all of the error returns. --lts
> .TP
> .B EINVAL
> An invalid value was specified for
> @@ -169,55 +272,102 @@ or
> was less than
> .IR start ;
> or
> -.I policy
> -was
> -.B MPOL_DEFAULT
> +.I start
> +is not a multiple of the system page size.
> +Or,
> +.I mode
> +is
> +.I MPOL_DEFAULT
> and
> .I nodemask
> -pointed to a non-empty set;
> +specified a non-empty set;
> or
> -.I policy
> -was
> -.B MPOL_BIND
> +.I mode
> +is
> +.I MPOL_BIND
> or
> -.B MPOL_INTERLEAVE
> +.I MPOL_INTERLEAVE
> and
> .I nodemask
> -pointed to an empty set,
> +is empty.
> +Or,
> +.I maxnode
> +specifies more than a page worth of bits.
> +Or,
> +.I nodemask
> +specifies one or more node ids that are
> +greater than the maximum supported node id,
> +or are not allowed in the calling task's context.
> +.\" "calling task's context" refers to cpusets. No man page avail to ref. --lts
> +Or, none of the node ids specified by
> +.I nodemask
> +are on-line, or none of the specified nodes contain memory.
> +.TP
> +.B EFAULT
> +Part of all of the memory range specified by
> +.I nodemask
> +and
> +.I maxnode
> +points outside your accessible address space.
> +Or, there was a unmapped hole in the specified memory range.
> .TP
> .B ENOMEM
> -System out of memory.
> +Insufficient kernel memory was available.
> .TP
> .B EIO
> .B MPOL_MF_STRICT
> was specified and an existing page was already on a node
> -that does not follow the policy.
> -.SH CONFORMING TO
> -This system call is Linux specific.
> +that does not follow the policy;
> +or
> +.B MPOL_MF_MOVE
> +or
> +.B MPOL_MF_MOVE_ALL
> +was specified and the kernel was unable to move all existing
> +pages in the range.
> +.TP
> +.B EPERM
> +The
> +.I flags
> +argument included the
> +.B MPOL_MF_MOVE_ALL
> +flag and the caller does not have the
> +.B CAP_SYS_NICE
> +privilege.
> +.\" ---------------------------------------------------------------
> .SH NOTES
> -NUMA policy is not supported on file mappings.
> +NUMA policy is not supported on a memory mapped file range
> +that was mapped with the
> +.I MAP_SHARED
> +flag.
>
> .B MPOL_MF_STRICT
> -is ignored on huge page mappings right now.
> +is ignored on huge page mappings.
>
> -It is unfortunate that the same flag,
> +The
> .BR MPOL_DEFAULT ,
> -has different effects for
> +mode has different effects for
> .BR mbind (2)
> and
> .BR set_mempolicy (2).
> -To select "allocation on the node of the CPU that
> -triggered the allocation" (like
> -.BR set_mempolicy (2)
> -.BR MPOL_DEFAULT )
> -when calling
> +When
> +.B MPOL_DEFAULT
> +is specified for a range of memory using
> .BR mbind (),
> +any pages subsequently allocated for that range will use
> +the process' policy, as set by
> +.BR set_mempolicy (2).
> +This effectively removes the explicit policy from the
> +specified range.
> +To select "local allocation" for a memory range,
> specify a
> -.I policy
> +.I mode
> of
> .B MPOL_PREFERRED
> -with an empty
> -.IR nodemask .
> +with an empty set of nodes.
> +This method will work for
> +.BR set_mempolicy (2),
> +as well.
> +.\" ---------------------------------------------------------------
> .SS "Versions and Library Support"
> The
> .BR mbind (),
> @@ -228,16 +378,18 @@ system calls were added to the Linux ker
> They are only available on kernels compiled with
> .BR CONFIG_NUMA .
>
> -Support for huge page policy was added with 2.6.16.
> -For interleave policy to be effective on huge page mappings the
> -policied memory needs to be tens of megabytes or larger.
> -
> -.B MPOL_MF_MOVE
> -and
> -.B MPOL_MF_MOVE_ALL
> -are only available on Linux 2.6.16 and later.
> +You can link with
> +.I -lnuma
> +to get system call definitions.
> +.I libnuma
> +and the required
> +.I numaif.h
> +header.
> +are available in the
> +.I numactl
> +package.
>
> -These system calls should not be used directly.
> +However, applications should not use these system calls directly.
> Instead, the higher level interface provided by the
> .BR numa (3)
> functions in the
> @@ -247,20 +399,26 @@ The
> .I numactl
> package is available at
> .IR ftp://ftp.suse.com/pub/people/ak/numa/ .
> -
> -You can link with
> -.I \-lnuma
> -to get system call definitions.
> -.I libnuma
> -is available in the
> -.I numactl
> +The package is also included in some Linux distributions.
> +Some distributions include the development library and header
> +in the separate
> +.I numactl-devel
> package.
> -This package also has the
> -.I numaif.h
> -header.
> +
> +Support for huge page policy was added with 2.6.16.
> +For interleave policy to be effective on huge page mappings the
> +policied memory needs to be tens of megabytes or larger.
> +
> +.B MPOL_MF_MOVE
> +and
> +.B MPOL_MF_MOVE_ALL
> +are only available on Linux 2.6.16 and later.
> +
> .SH SEE ALSO
> .BR numa (3),
> .BR numactl (8),
> .BR set_mempolicy (2),
> .BR get_mempolicy (2),
> -.BR mmap (2)
> +.BR mmap (2),
> +.BR shmget (2),
> +.BR shmat (2).
>
>
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7
Want to help with man page maintenance? Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages/
read the HOWTOHELP file and grep the source files for 'FIXME'.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-08-27 11:29 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-29 19:33 [PATCH] Document Linux Memory Policy Lee Schermerhorn
2007-05-29 20:04 ` Christoph Lameter
2007-05-29 20:16 ` Andi Kleen
2007-05-30 16:17 ` Lee Schermerhorn
2007-05-30 17:41 ` Christoph Lameter
2007-05-31 8:20 ` Michael Kerrisk
2007-05-31 14:49 ` Lee Schermerhorn
2007-05-31 15:56 ` Michael Kerrisk
2007-06-01 21:15 ` [PATCH] enhance memory policy sys call man pages v1 Lee Schermerhorn
2007-07-23 6:11 ` Michael Kerrisk
2007-07-23 6:32 ` mbind.2 man page patch Michael Kerrisk
2007-07-23 14:26 ` Lee Schermerhorn
2007-07-26 17:19 ` Michael Kerrisk
2007-07-26 18:06 ` Lee Schermerhorn
2007-07-26 18:18 ` Michael Kerrisk
2007-07-23 6:32 ` get_mempolicy.2 " Michael Kerrisk
2007-07-28 9:31 ` Michael Kerrisk
2007-08-09 18:43 ` Lee Schermerhorn
2007-08-09 20:57 ` Michael Kerrisk
2007-08-16 20:05 ` Andi Kleen
2007-08-18 5:50 ` Michael Kerrisk
2007-08-21 15:45 ` Lee Schermerhorn
2007-08-22 4:10 ` Michael Kerrisk
2007-08-22 16:08 ` [PATCH] Mempolicy Man Pages 2.64 1/3 - mbind.2 Lee Schermerhorn
2007-08-27 11:29 ` Michael Kerrisk [this message]
2007-08-22 16:10 ` [PATCH] Mempolicy Man Pages 2.64 2/3 - set_mempolicy.2 Lee Schermerhorn
2007-08-27 11:30 ` Michael Kerrisk
2007-08-22 16:12 ` [PATCH] Mempolicy Man Pages 2.64 3/3 - get_mempolicy.2 Lee Schermerhorn
2007-08-27 11:30 ` Michael Kerrisk
2007-08-27 10:46 ` get_mempolicy.2 man page patch Michael Kerrisk
2007-07-23 6:33 ` set_mempolicy.2 " Michael Kerrisk
2007-05-30 16:55 ` [PATCH] Document Linux Memory Policy Lee Schermerhorn
2007-05-30 17:56 ` Christoph Lameter
2007-05-31 6:18 ` Gleb Natapov
2007-05-31 6:41 ` Christoph Lameter
2007-05-31 6:47 ` Gleb Natapov
2007-05-31 6:56 ` Christoph Lameter
2007-05-31 7:11 ` Gleb Natapov
2007-05-31 7:24 ` Christoph Lameter
2007-05-31 7:39 ` Gleb Natapov
2007-05-31 17:43 ` Christoph Lameter
2007-05-31 17:07 ` Lee Schermerhorn
2007-05-31 10:43 ` Andi Kleen
2007-05-31 11:04 ` Gleb Natapov
2007-05-31 11:30 ` Gleb Natapov
2007-05-31 15:26 ` Lee Schermerhorn
2007-05-31 17:41 ` Gleb Natapov
2007-05-31 18:56 ` Lee Schermerhorn
2007-05-31 20:06 ` Gleb Natapov
2007-05-31 20:43 ` Andi Kleen
2007-06-01 9:38 ` Gleb Natapov
2007-06-01 10:21 ` Andi Kleen
2007-06-01 12:25 ` Gleb Natapov
2007-06-01 13:09 ` Andi Kleen
2007-06-01 17:15 ` Lee Schermerhorn
2007-06-01 18:43 ` Christoph Lameter
2007-06-01 19:38 ` Lee Schermerhorn
2007-06-01 19:48 ` Christoph Lameter
2007-06-01 21:05 ` Lee Schermerhorn
2007-06-01 21:56 ` Christoph Lameter
2007-06-04 13:46 ` Lee Schermerhorn
2007-06-04 16:34 ` Christoph Lameter
2007-06-04 17:02 ` Lee Schermerhorn
2007-06-04 17:11 ` Christoph Lameter
2007-06-04 20:23 ` Andi Kleen
2007-06-04 21:51 ` Christoph Lameter
2007-06-05 14:30 ` Lee Schermerhorn
2007-06-01 20:28 ` Gleb Natapov
2007-06-01 20:45 ` Christoph Lameter
2007-06-01 21:10 ` Lee Schermerhorn
2007-06-01 21:58 ` Christoph Lameter
2007-06-02 7:23 ` Gleb Natapov
2007-05-31 11:47 ` Andi Kleen
2007-05-31 11:59 ` Gleb Natapov
2007-05-31 12:15 ` Andi Kleen
2007-05-31 12:18 ` Gleb Natapov
2007-05-31 18:28 ` Lee Schermerhorn
2007-05-31 18:35 ` Christoph Lameter
2007-05-31 19:29 ` Lee Schermerhorn
2007-05-31 19:25 ` Paul Jackson
2007-05-31 20:22 ` Lee Schermerhorn
2007-05-29 20:07 ` Andi Kleen
2007-05-30 16:04 ` Lee Schermerhorn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46D2B57E.30800@gmx.net \
--to=mtk-manpages@gmx.net \
--cc=Lee.Schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=eric.whitney@hp.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.