From: Vasileios Karakasis <bkk@cslab.ece.ntua.gr>
To: Andi Kleen <andi@firstfloor.org>
Cc: linux-numa@vger.kernel.org,
'Kornilios Kourtis' <kkourt@cslab.ece.ntua.gr>
Subject: Re: realloc function
Date: Tue, 11 Jan 2011 00:12:36 +0200 [thread overview]
Message-ID: <4D2B8454.3030900@cslab.ece.ntua.gr> (raw)
In-Reply-To: <20110105192532.GD25713@one.firstfloor.org>
[-- Attachment #1.1: Type: text/plain, Size: 2067 bytes --]
Hi,
I am submitting the final patch. Essentially, it is my original enhanced
with some comments about the rationale as we discussed it here and an
entry + brief description in the man page.
Regards,
On 01/05/2011 09:25 PM, Andi Kleen wrote:
> On Wed, Jan 05, 2011 at 05:00:43PM +0200, Vasileios Karakasis wrote:
>> Peeking inside the mremap() source, I can see that the kernel already
>> does this, i.e., mremap() preserves the policy of the original vm area.
>
> That is true.
>>
>> The problem is when the user has not specified a binding for the
>> original mapping (default policy), in which case copying explicitly the
>> policy from the old to the new pages won't work either; the new pages
>> will still have MPOL_DEFAULT. So realloc() cannot guarantee that the new
>
>
> It would be possible to do
>
> get_mempolicy MPOL_F_ADDR
> if policy == MPOL_DEFAULT:
> get_mempolicy MPOL_F_NODE|MPOL_F_ADDR, &node
> mbind MPOL_PREFERRED, node
>
> But then you end up with preferred instead of default. It should
> be usually the same, but may not in some corner cases.
>
> I guess you're right and that case is too obscure to care about.
> I guess your original patch without anything was good enough.
> It may be worth it to add some comments on this rationale though.
>
>
>> pages will be allocated on the same node as the preceding alloc(),
>> unless there is a way to obtain the actual node that the pages of the
>> original allocation were allocated on. In my opinion, this isn't a real
>> problem, because even the simple numa_alloc() using the default policy,
>> cannot guarantee that the pages will be allocated on the node of the
>> calling cpu: what if the task is migrated to a different cpu on a
>> different node, while touching (i.e., allocating) the pages with the
>> police_memory_int()?
>
> process policy and MPOL_DEFAULT are always just heuristics; such races
> can always occur. They usually should not because the scheduler
> does not migrate too frequently.
>
> -Andi
--
V.K.
[-- Attachment #1.2: numactl-2.0.6-realloc-patch --]
[-- Type: text/plain, Size: 7884 bytes --]
diff -urN numactl-2.0.6-orig/libnuma.c numactl-2.0.6/libnuma.c
--- numactl-2.0.6-orig/libnuma.c 2011-01-03 15:09:23.000000000 +0200
+++ numactl-2.0.6/libnuma.c 2011-01-10 23:49:58.000000000 +0200
@@ -871,6 +871,23 @@
return mem;
}
+void *numa_realloc(void *old_addr, size_t old_size, size_t new_size)
+{
+ char *mem;
+ mem = mremap(old_addr, old_size, new_size, MREMAP_MAYMOVE);
+ if (mem == (char *)-1)
+ return NULL;
+ /*
+ * The memory policy of the allocated pages is preserved by mremap(), so
+ * there is no need to (re)set it here. If the policy of the original
+ * allocation is not set, the new pages will be allocated according to the
+ * process' mempolicy. Trying to allocate explicitly the new pages on the
+ * same node as the original ones would require changing the policy of the
+ * newly allocated pages, which violates the numa_realloc() semantics.
+ */
+ return mem;
+}
+
void *numa_alloc_interleaved_subset_v1(size_t size, const nodemask_t *mask)
{
char *mem;
diff -urN numactl-2.0.6-orig/Makefile numactl-2.0.6/Makefile
--- numactl-2.0.6-orig/Makefile 2011-01-03 15:09:23.000000000 +0200
+++ numactl-2.0.6/Makefile 2011-01-03 23:22:57.000000000 +0200
@@ -31,7 +31,7 @@
test/after test/before threadtest test_move_pages \
test/mbind_mig_pages test/migrate_pages \
migratepages migspeed migspeed.o libnuma.a \
- test/move_pages
+ test/move_pages test/realloc_test
SOURCES := bitops.c libnuma.c distance.c memhog.c numactl.c numademo.c \
numamon.c shm.c stream_lib.c stream_main.c syscall.c util.c mt.c \
clearcache.c test/*.c
@@ -43,7 +43,7 @@
all: numactl migratepages migspeed libnuma.so numademo numamon memhog \
test/tshared stream test/mynode test/pagesize test/ftok test/prefered \
test/randmap test/nodemap test/distance test/tbitmap test/move_pages \
- test/mbind_mig_pages test/migrate_pages libnuma.a
+ test/mbind_mig_pages test/migrate_pages test/realloc_test libnuma.a
numactl: numactl.o util.o shm.o bitops.o libnuma.so
@@ -123,6 +123,8 @@
test/migrate_pages: test/migrate_pages.c libnuma.so
+test/realloc_test: test/realloc_test.c libnuma.so
+
.PHONY: install all clean html depend
MANPAGES := numa.3 numactl.8 numastat.8 migratepages.8 migspeed.8
diff -urN numactl-2.0.6-orig/numa.3 numactl-2.0.6/numa.3
--- numactl-2.0.6-orig/numa.3 2011-01-03 15:09:23.000000000 +0200
+++ numactl-2.0.6/numa.3 2011-01-10 23:39:02.000000000 +0200
@@ -87,6 +87,8 @@
.BI "void *numa_alloc_interleaved_subset(size_t " size ", struct bitmask *" nodemask );
.BI "void *numa_alloc(size_t " size );
.br
+.BI "void *numa_realloc(void *"old_addr ", size_t " old_size ", size_t " new_size );
+.br
.BI "void numa_free(void *" start ", size_t " size );
.sp
.BI "int numa_run_on_node(int " node );
@@ -599,6 +601,39 @@
.BR numa_free ().
On errors NULL is returned.
+.BR numa_realloc ()
+changes the size of the memory area pointed to by
+.I old_addr
+from
+.I old_size
+to
+.I new_size.
+The memory area pointed to by
+.I old_addr
+must have been allocated with one of the
+.BR numa_alloc*
+functions.
+The
+.I new_size
+will be rounded up to a multiple of the system page size. The contents of the
+memory area will be unchanged to the minimum of the old and new sizes; newly
+allocated memory will be uninitialized. The memory policy (and node bindings)
+associated with the original memory area will be preserved in the resized
+area. For example, if the initial area was allocated with a call to
+.BR numa_alloc_onnode(),
+then the new pages (if the area is enlarged) will be allocated on the same node.
+However, if no memory policy was set for the original area, then
+.BR numa_realloc ()
+cannot guarantee that the new pages will be allocated on the same node. On
+success, the address of the resized area is returned (which might be different
+from that of the initial area), otherwise NULL is returned and
+.I errno
+is set to indicate the error. The pointer returned by
+.BR numa_realloc ()
+is suitable for passing to
+.BR numa_free ().
+
+
.BR numa_free ()
frees
.I size
diff -urN numactl-2.0.6-orig/numa.h numactl-2.0.6/numa.h
--- numactl-2.0.6-orig/numa.h 2011-01-03 15:09:23.000000000 +0200
+++ numactl-2.0.6/numa.h 2011-01-11 00:06:12.000000000 +0200
@@ -212,6 +212,8 @@
void *numa_alloc_local(size_t size);
/* Allocation with current policy */
void *numa_alloc(size_t size);
+/* Change the size of a memory area preserving the memory policy */
+void *numa_realloc(void *old_addr, size_t old_size, size_t new_size);
/* Free memory allocated by the functions above */
void numa_free(void *mem, size_t size);
diff -urN numactl-2.0.6-orig/test/realloc_test.c numactl-2.0.6/test/realloc_test.c
--- numactl-2.0.6-orig/test/realloc_test.c 1970-01-01 02:00:00.000000000 +0200
+++ numactl-2.0.6/test/realloc_test.c 2011-01-10 23:55:37.000000000 +0200
@@ -0,0 +1,108 @@
+#include <assert.h>
+#include <errno.h>
+#include <limits.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <sys/mman.h>
+#include "numa.h"
+#include "numaif.h"
+
+#define DEFAULT_NR_PAGES 1024
+
+static int parse_int(const char *str)
+{
+ char *endptr;
+ long ret = strtol(str, &endptr, 0);
+ if (*endptr != '\0') {
+ fprintf(stderr, "[error] strtol() failed: parse error: %s\n", endptr);
+ exit(1);
+ }
+
+ if (errno == ERANGE)
+ fprintf(stderr, "[warning] strtol() out of range\n");
+
+ if (ret > INT_MAX || ret < INT_MIN) {
+ fprintf(stderr, "[warning] parse_int() out of range\n");
+ ret = (ret > 0) ? INT_MAX : INT_MIN;
+ }
+
+ return (int) ret;
+}
+
+int main(int argc, char **argv)
+{
+ char *mem;
+ int page_size = numa_pagesize();
+ int node = 0;
+ int nr_pages = DEFAULT_NR_PAGES;
+
+ if (numa_available() < 0) {
+ fprintf(stderr, "numa is not available");
+ exit(1);
+ }
+
+ if (argc > 1)
+ node = parse_int(argv[1]);
+ if (argc > 2)
+ nr_pages = parse_int(argv[2]);
+
+ mem = numa_alloc_onnode(page_size, node);
+
+ /* Store the policy of the newly allocated area */
+ unsigned long nodemask;
+ int mode;
+ int nr_nodes = numa_num_possible_nodes();
+ if (get_mempolicy(&mode, &nodemask, nr_nodes, mem,
+ MPOL_F_NODE | MPOL_F_ADDR) < 0) {
+ perror("get_mempolicy() failed");
+ exit(1);
+ }
+
+ /* Print some info */
+ printf("Page size: %d\n", page_size);
+ printf("Pages realloc'ed: %d\n", nr_pages);
+ printf("Allocate data in node: %d\n", node);
+
+ int i;
+ int nr_inplace = 0;
+ int nr_moved = 0;
+ for (i = 0; i < nr_pages; i++) {
+ /* Enlarge mem with one more page */
+ char *new_mem = numa_realloc(mem, (i+1)*page_size, (i+2)*page_size);
+ if (!new_mem) {
+ perror("numa_realloc() failed");
+ exit(1);
+ }
+
+ if (new_mem == mem)
+ ++nr_inplace;
+ else
+ ++nr_moved;
+ mem = new_mem;
+
+ /* Check the policy of the realloc'ed area */
+ unsigned long realloc_nodemask;
+ int realloc_mode;
+ if (get_mempolicy(&realloc_mode, &realloc_nodemask,
+ nr_nodes, mem, MPOL_F_NODE | MPOL_F_ADDR) < 0) {
+ perror("get_mempolicy() failed");
+ exit(1);
+ }
+
+ assert(realloc_nodemask == nodemask &&
+ realloc_mode == mode && "policy changed");
+ }
+
+ /* Shrink to the original size */
+ mem = numa_realloc(mem, (nr_pages + 1)*page_size, page_size);
+ if (!mem) {
+ perror("numa_realloc() failed");
+ exit(1);
+ }
+
+ numa_free(mem, page_size);
+ printf("In-place reallocs: %d\n", nr_inplace);
+ printf("Moved reallocs: %d\n", nr_moved);
+ return 0;
+}
diff -urN numactl-2.0.6-orig/versions.ldscript numactl-2.0.6/versions.ldscript
--- numactl-2.0.6-orig/versions.ldscript 2011-01-03 15:09:23.000000000 +0200
+++ numactl-2.0.6/versions.ldscript 2011-01-10 18:36:37.000000000 +0200
@@ -87,6 +87,7 @@
numa_alloc_interleaved_subset;
numa_alloc_local;
numa_alloc_onnode;
+ numa_realloc;
numa_allocate_cpumask;
numa_allocate_nodemask;
numa_available;
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
next prev parent reply other threads:[~2011-01-10 22:12 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-02 17:37 realloc function Vasileios Karakasis
2011-01-02 23:42 ` Andi Kleen
2011-01-03 21:56 ` Vasileios Karakasis
2011-01-03 22:44 ` Cliff Wickman
2011-01-04 22:20 ` Andi Kleen
2011-01-05 12:11 ` Vasileios Karakasis
2011-01-05 15:00 ` Vasileios Karakasis
2011-01-05 19:25 ` Andi Kleen
2011-01-10 22:12 ` Vasileios Karakasis [this message]
2011-01-10 22:17 ` Andi Kleen
2011-01-11 16:29 ` Cliff Wickman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D2B8454.3030900@cslab.ece.ntua.gr \
--to=bkk@cslab.ece.ntua.gr \
--cc=andi@firstfloor.org \
--cc=kkourt@cslab.ece.ntua.gr \
--cc=linux-numa@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).