From: Cliff Wickman <cpw@sgi.com>
To: Vasileios Karakasis <bkk@cslab.ece.ntua.gr>
Cc: linux-numa@vger.kernel.org
Subject: Re: realloc function
Date: Tue, 11 Jan 2011 10:29:51 -0600 [thread overview]
Message-ID: <20110111162951.GA19748@sgi.com> (raw)
In-Reply-To: <4D2B8454.3030900@cslab.ece.ntua.gr>
Hi Vasileios,
Thanks.
And thanks to Andi for the review. I'll put your patch into 2.0.7-rc1.
I'd like to solve an unrelated regression before I put it on
the download page.
-Cliff
On Tue, Jan 11, 2011 at 12:12:36AM +0200, Vasileios Karakasis wrote:
> Hi,
>
> I am submitting the final patch. Essentially, it is my original enhanced
> with some comments about the rationale as we discussed it here and an
> entry + brief description in the man page.
>
> Regards,
>
> On 01/05/2011 09:25 PM, Andi Kleen wrote:
> > On Wed, Jan 05, 2011 at 05:00:43PM +0200, Vasileios Karakasis wrote:
> >> Peeking inside the mremap() source, I can see that the kernel already
> >> does this, i.e., mremap() preserves the policy of the original vm area.
> >
> > That is true.
> >>
> >> The problem is when the user has not specified a binding for the
> >> original mapping (default policy), in which case copying explicitly the
> >> policy from the old to the new pages won't work either; the new pages
> >> will still have MPOL_DEFAULT. So realloc() cannot guarantee that the new
> >
> >
> > It would be possible to do
> >
> > get_mempolicy MPOL_F_ADDR
> > if policy == MPOL_DEFAULT:
> > get_mempolicy MPOL_F_NODE|MPOL_F_ADDR, &node
> > mbind MPOL_PREFERRED, node
> >
> > But then you end up with preferred instead of default. It should
> > be usually the same, but may not in some corner cases.
> >
> > I guess you're right and that case is too obscure to care about.
> > I guess your original patch without anything was good enough.
> > It may be worth it to add some comments on this rationale though.
> >
> >
> >> pages will be allocated on the same node as the preceding alloc(),
> >> unless there is a way to obtain the actual node that the pages of the
> >> original allocation were allocated on. In my opinion, this isn't a real
> >> problem, because even the simple numa_alloc() using the default policy,
> >> cannot guarantee that the pages will be allocated on the node of the
> >> calling cpu: what if the task is migrated to a different cpu on a
> >> different node, while touching (i.e., allocating) the pages with the
> >> police_memory_int()?
> >
> > process policy and MPOL_DEFAULT are always just heuristics; such races
> > can always occur. They usually should not because the scheduler
> > does not migrate too frequently.
> >
> > -Andi
>
> --
> V.K.
> diff -urN numactl-2.0.6-orig/libnuma.c numactl-2.0.6/libnuma.c
> --- numactl-2.0.6-orig/libnuma.c 2011-01-03 15:09:23.000000000 +0200
> +++ numactl-2.0.6/libnuma.c 2011-01-10 23:49:58.000000000 +0200
> @@ -871,6 +871,23 @@
> return mem;
> }
>
> +void *numa_realloc(void *old_addr, size_t old_size, size_t new_size)
> +{
> + char *mem;
> + mem = mremap(old_addr, old_size, new_size, MREMAP_MAYMOVE);
> + if (mem == (char *)-1)
> + return NULL;
> + /*
> + * The memory policy of the allocated pages is preserved by mremap(), so
> + * there is no need to (re)set it here. If the policy of the original
> + * allocation is not set, the new pages will be allocated according to the
> + * process' mempolicy. Trying to allocate explicitly the new pages on the
> + * same node as the original ones would require changing the policy of the
> + * newly allocated pages, which violates the numa_realloc() semantics.
> + */
> + return mem;
> +}
> +
> void *numa_alloc_interleaved_subset_v1(size_t size, const nodemask_t *mask)
> {
> char *mem;
> diff -urN numactl-2.0.6-orig/Makefile numactl-2.0.6/Makefile
> --- numactl-2.0.6-orig/Makefile 2011-01-03 15:09:23.000000000 +0200
> +++ numactl-2.0.6/Makefile 2011-01-03 23:22:57.000000000 +0200
> @@ -31,7 +31,7 @@
> test/after test/before threadtest test_move_pages \
> test/mbind_mig_pages test/migrate_pages \
> migratepages migspeed migspeed.o libnuma.a \
> - test/move_pages
> + test/move_pages test/realloc_test
> SOURCES := bitops.c libnuma.c distance.c memhog.c numactl.c numademo.c \
> numamon.c shm.c stream_lib.c stream_main.c syscall.c util.c mt.c \
> clearcache.c test/*.c
> @@ -43,7 +43,7 @@
> all: numactl migratepages migspeed libnuma.so numademo numamon memhog \
> test/tshared stream test/mynode test/pagesize test/ftok test/prefered \
> test/randmap test/nodemap test/distance test/tbitmap test/move_pages \
> - test/mbind_mig_pages test/migrate_pages libnuma.a
> + test/mbind_mig_pages test/migrate_pages test/realloc_test libnuma.a
>
> numactl: numactl.o util.o shm.o bitops.o libnuma.so
>
> @@ -123,6 +123,8 @@
>
> test/migrate_pages: test/migrate_pages.c libnuma.so
>
> +test/realloc_test: test/realloc_test.c libnuma.so
> +
> .PHONY: install all clean html depend
>
> MANPAGES := numa.3 numactl.8 numastat.8 migratepages.8 migspeed.8
> diff -urN numactl-2.0.6-orig/numa.3 numactl-2.0.6/numa.3
> --- numactl-2.0.6-orig/numa.3 2011-01-03 15:09:23.000000000 +0200
> +++ numactl-2.0.6/numa.3 2011-01-10 23:39:02.000000000 +0200
> @@ -87,6 +87,8 @@
> .BI "void *numa_alloc_interleaved_subset(size_t " size ", struct bitmask *" nodemask );
> .BI "void *numa_alloc(size_t " size );
> .br
> +.BI "void *numa_realloc(void *"old_addr ", size_t " old_size ", size_t " new_size );
> +.br
> .BI "void numa_free(void *" start ", size_t " size );
> .sp
> .BI "int numa_run_on_node(int " node );
> @@ -599,6 +601,39 @@
> .BR numa_free ().
> On errors NULL is returned.
>
> +.BR numa_realloc ()
> +changes the size of the memory area pointed to by
> +.I old_addr
> +from
> +.I old_size
> +to
> +.I new_size.
> +The memory area pointed to by
> +.I old_addr
> +must have been allocated with one of the
> +.BR numa_alloc*
> +functions.
> +The
> +.I new_size
> +will be rounded up to a multiple of the system page size. The contents of the
> +memory area will be unchanged to the minimum of the old and new sizes; newly
> +allocated memory will be uninitialized. The memory policy (and node bindings)
> +associated with the original memory area will be preserved in the resized
> +area. For example, if the initial area was allocated with a call to
> +.BR numa_alloc_onnode(),
> +then the new pages (if the area is enlarged) will be allocated on the same node.
> +However, if no memory policy was set for the original area, then
> +.BR numa_realloc ()
> +cannot guarantee that the new pages will be allocated on the same node. On
> +success, the address of the resized area is returned (which might be different
> +from that of the initial area), otherwise NULL is returned and
> +.I errno
> +is set to indicate the error. The pointer returned by
> +.BR numa_realloc ()
> +is suitable for passing to
> +.BR numa_free ().
> +
> +
> .BR numa_free ()
> frees
> .I size
> diff -urN numactl-2.0.6-orig/numa.h numactl-2.0.6/numa.h
> --- numactl-2.0.6-orig/numa.h 2011-01-03 15:09:23.000000000 +0200
> +++ numactl-2.0.6/numa.h 2011-01-11 00:06:12.000000000 +0200
> @@ -212,6 +212,8 @@
> void *numa_alloc_local(size_t size);
> /* Allocation with current policy */
> void *numa_alloc(size_t size);
> +/* Change the size of a memory area preserving the memory policy */
> +void *numa_realloc(void *old_addr, size_t old_size, size_t new_size);
> /* Free memory allocated by the functions above */
> void numa_free(void *mem, size_t size);
>
> diff -urN numactl-2.0.6-orig/test/realloc_test.c numactl-2.0.6/test/realloc_test.c
> --- numactl-2.0.6-orig/test/realloc_test.c 1970-01-01 02:00:00.000000000 +0200
> +++ numactl-2.0.6/test/realloc_test.c 2011-01-10 23:55:37.000000000 +0200
> @@ -0,0 +1,108 @@
> +#include <assert.h>
> +#include <errno.h>
> +#include <limits.h>
> +#include <unistd.h>
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <sys/mman.h>
> +#include "numa.h"
> +#include "numaif.h"
> +
> +#define DEFAULT_NR_PAGES 1024
> +
> +static int parse_int(const char *str)
> +{
> + char *endptr;
> + long ret = strtol(str, &endptr, 0);
> + if (*endptr != '\0') {
> + fprintf(stderr, "[error] strtol() failed: parse error: %s\n", endptr);
> + exit(1);
> + }
> +
> + if (errno == ERANGE)
> + fprintf(stderr, "[warning] strtol() out of range\n");
> +
> + if (ret > INT_MAX || ret < INT_MIN) {
> + fprintf(stderr, "[warning] parse_int() out of range\n");
> + ret = (ret > 0) ? INT_MAX : INT_MIN;
> + }
> +
> + return (int) ret;
> +}
> +
> +int main(int argc, char **argv)
> +{
> + char *mem;
> + int page_size = numa_pagesize();
> + int node = 0;
> + int nr_pages = DEFAULT_NR_PAGES;
> +
> + if (numa_available() < 0) {
> + fprintf(stderr, "numa is not available");
> + exit(1);
> + }
> +
> + if (argc > 1)
> + node = parse_int(argv[1]);
> + if (argc > 2)
> + nr_pages = parse_int(argv[2]);
> +
> + mem = numa_alloc_onnode(page_size, node);
> +
> + /* Store the policy of the newly allocated area */
> + unsigned long nodemask;
> + int mode;
> + int nr_nodes = numa_num_possible_nodes();
> + if (get_mempolicy(&mode, &nodemask, nr_nodes, mem,
> + MPOL_F_NODE | MPOL_F_ADDR) < 0) {
> + perror("get_mempolicy() failed");
> + exit(1);
> + }
> +
> + /* Print some info */
> + printf("Page size: %d\n", page_size);
> + printf("Pages realloc'ed: %d\n", nr_pages);
> + printf("Allocate data in node: %d\n", node);
> +
> + int i;
> + int nr_inplace = 0;
> + int nr_moved = 0;
> + for (i = 0; i < nr_pages; i++) {
> + /* Enlarge mem with one more page */
> + char *new_mem = numa_realloc(mem, (i+1)*page_size, (i+2)*page_size);
> + if (!new_mem) {
> + perror("numa_realloc() failed");
> + exit(1);
> + }
> +
> + if (new_mem == mem)
> + ++nr_inplace;
> + else
> + ++nr_moved;
> + mem = new_mem;
> +
> + /* Check the policy of the realloc'ed area */
> + unsigned long realloc_nodemask;
> + int realloc_mode;
> + if (get_mempolicy(&realloc_mode, &realloc_nodemask,
> + nr_nodes, mem, MPOL_F_NODE | MPOL_F_ADDR) < 0) {
> + perror("get_mempolicy() failed");
> + exit(1);
> + }
> +
> + assert(realloc_nodemask == nodemask &&
> + realloc_mode == mode && "policy changed");
> + }
> +
> + /* Shrink to the original size */
> + mem = numa_realloc(mem, (nr_pages + 1)*page_size, page_size);
> + if (!mem) {
> + perror("numa_realloc() failed");
> + exit(1);
> + }
> +
> + numa_free(mem, page_size);
> + printf("In-place reallocs: %d\n", nr_inplace);
> + printf("Moved reallocs: %d\n", nr_moved);
> + return 0;
> +}
> diff -urN numactl-2.0.6-orig/versions.ldscript numactl-2.0.6/versions.ldscript
> --- numactl-2.0.6-orig/versions.ldscript 2011-01-03 15:09:23.000000000 +0200
> +++ numactl-2.0.6/versions.ldscript 2011-01-10 18:36:37.000000000 +0200
> @@ -87,6 +87,7 @@
> numa_alloc_interleaved_subset;
> numa_alloc_local;
> numa_alloc_onnode;
> + numa_realloc;
> numa_allocate_cpumask;
> numa_allocate_nodemask;
> numa_available;
--
Cliff Wickman
SGI
cpw@sgi.com
(651) 683-3824
prev parent reply other threads:[~2011-01-11 16:29 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-02 17:37 realloc function Vasileios Karakasis
2011-01-02 23:42 ` Andi Kleen
2011-01-03 21:56 ` Vasileios Karakasis
2011-01-03 22:44 ` Cliff Wickman
2011-01-04 22:20 ` Andi Kleen
2011-01-05 12:11 ` Vasileios Karakasis
2011-01-05 15:00 ` Vasileios Karakasis
2011-01-05 19:25 ` Andi Kleen
2011-01-10 22:12 ` Vasileios Karakasis
2011-01-10 22:17 ` Andi Kleen
2011-01-11 16:29 ` Cliff Wickman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110111162951.GA19748@sgi.com \
--to=cpw@sgi.com \
--cc=bkk@cslab.ece.ntua.gr \
--cc=linux-numa@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).