* [Linux-ia64] mmap and malloc questions on IA-64 linux
@ 2002-08-01 16:47 Olivier, JeffreyX
2002-08-01 18:09 ` David Mosberger
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Olivier, JeffreyX @ 2002-08-01 16:47 UTC (permalink / raw)
To: linux-ia64
I am trying to enhance a large distributed virtual shared memory system by
relaxing the constraints on the size of the shared heap. I am doing
experiments running an openMP application that allocates a large shared heap
and writes to every location and reads the value of every location on the
other machine. I realize that this is extremely inefficient but it is meant
to test the robustness and functionality of this large shared heap.
Currently, the shared heap is allocated via the following mmap call
mmap(addr, len, prot, flags, fd, 0)
where
addr = 0
len = size of heap
prot = PROT_READ|PROT_WRITE
flags = MAP_SHARED|MAP_FILE
fd is a pointer to a file called /scratch1/heap.<pid>
This file is created, opened, the last byte is written to, and then it is
unlinked before performing the mmap.
Also, our system is based on Lazy Release Consistency so for each page there
is a twin and the behavior of our application forces us to need enough
physical space to hold the twins so I also created a twin mapping following
the same procedure as before.
These mappings succeed and the program starts to run.
I also created wrappers in my application for malloc, free, realloc, and
calloc to monitor how much memory is requested by the program.
The machines I am running on are identical Itaniums running redhat linux.
Both machines have 1.0 GB RAM and 1.5 GB of swap space. The /scratch1
partition is 18 GB and was added solely for testing this application.
For a shared heap size of 1.0 GB, the application runs correctly. The total
mmap for this app is 2.0 GB (shared heap and twins) and memory allocated
through the malloc family is about 300 MB.
For a shared heap size of 1.2 GB, the application runs but it fails to
complete. One of the mallocs complains that it the system is out of memory.
At this point, top reports that there is still 1 GB of swap space remaining
and as far as my understanding goes, the mmaped space is using the space on
/scratch1 for swap. After doing some research on the subject, I have found
a number of newsgroup posts to try things like changing vm.overcommit_memory
(which looks like it might work by looking at the kernel source). This
didn't change anything. I also was able to run the same app with the same
parameters with the same result on just 1 of the machines, oversubscribing
that machine...therefore using twice the memory. I get to the same point
and fail just after the 300MB mark.
So, naturally, I am am at my wits end here. I have a few questions that I
am sure some of you linux gurus can answer.
A. Does the system have a limit on how much you can mmap? If so, why does
it wait until I actually use the space to run out?
B. Should the system be using the scratch disk to swap the shared heap? I
assume it does since df is reporting space being used while running the app.
C. Is there a limit on how much memory a process can ask for?
D. Would changing the freepages limits help (currently 255 510 765),
buffermem?
E. Could these limits be affected by I/O or network traffic?
F. Are there any other limits that I am not thinking of?
Any help would be appreciated.
Thanks,
Jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Linux-ia64] mmap and malloc questions on IA-64 linux
2002-08-01 16:47 [Linux-ia64] mmap and malloc questions on IA-64 linux Olivier, JeffreyX
@ 2002-08-01 18:09 ` David Mosberger
2002-08-02 15:25 ` Olivier, JeffreyX
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: David Mosberger @ 2002-08-01 18:09 UTC (permalink / raw)
To: linux-ia64
>>>>> On Thu, 1 Aug 2002 09:47:37 -0700, "Olivier, JeffreyX" <jeffreyx.olivier@intel.com> said:
Olivier> For a shared heap size of 1.2 GB, the application runs but
Olivier> it fails to complete. One of the mallocs complains that it
Olivier> the system is out of memory.
Find out why the malloc is complaining. It could be that something
got mmap'ed relatively close to the break value. If so, attempting a
(small) malloc might cause the break value to cross into the mmaped
area, which would cause it to fail.
To track this down, I'd recommend to look at:
- /proc/PID/maps (where PID is the process ids of the "interesting"
tasks)
- the output of free
Hope this helps,
--david
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [Linux-ia64] mmap and malloc questions on IA-64 linux
2002-08-01 16:47 [Linux-ia64] mmap and malloc questions on IA-64 linux Olivier, JeffreyX
2002-08-01 18:09 ` David Mosberger
@ 2002-08-02 15:25 ` Olivier, JeffreyX
2002-08-02 21:08 ` David Mosberger
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Olivier, JeffreyX @ 2002-08-02 15:25 UTC (permalink / raw)
To: linux-ia64
> - /proc/PID/maps (where PID is the process ids of the "interesting"
> tasks)
Thanks for the response.
/proc/PID/maps appears to be showing any entry for every single page in the
mapped files. Is this normal? Shouldn't there just be one map for the
whole file? At any rate, the mappings appear to be in the address range
described in the next paragraph.
Also, I started printing the addresses returned by malloc. I map the files
starting at 0x6000000300000000. The addresses for malloc are not getting
anywhere close to that limit. However, something very peculiar happens.
Just before the failed malloc, the last 3 or 4 successful mallocs return
addresses in the 0x2000000000000000 range which is region 1 and should(?) be
reserved for shared memory (according to the figure on page 149 or your ia64
kernel book). Any idea why this would happen. I don't explicitly allocate
memory as shared. At the point where this happens, the current brk is at
0x6000000000898000. That is almost 12 GB from the first mmap.
Here is the output of free just before it runs out of memory
total used free shared buffers cached
Mem: 952576 948448 4128 0 656 836416
-/+ buffers/cache: 111376 841200
Swap: 1542176 650992 891184
To explain the large number of cached pages, /proc/sys/vm/pagecache is
currently set to
2 15 75
Thanks,
Jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [Linux-ia64] mmap and malloc questions on IA-64 linux
2002-08-01 16:47 [Linux-ia64] mmap and malloc questions on IA-64 linux Olivier, JeffreyX
2002-08-01 18:09 ` David Mosberger
2002-08-02 15:25 ` Olivier, JeffreyX
@ 2002-08-02 21:08 ` David Mosberger
2002-08-05 15:40 ` Olivier, JeffreyX
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: David Mosberger @ 2002-08-02 21:08 UTC (permalink / raw)
To: linux-ia64
>>>>> On Fri, 2 Aug 2002 08:25:50 -0700 , "Olivier, JeffreyX" <jeffreyx.olivier@intel.com> said:
Olivier> /proc/PID/maps appears to be showing any entry for every
Olivier> single page in the mapped files. Is this normal?
Olivier> Shouldn't there just be one map for the whole file? At any
Olivier> rate, the mappings appear to be in the address range
Olivier> described in the next paragraph.
Yes, normally there is one entry per mapped file. If you use munmap()
and mmap() a lot, the merging-logic in the kernel may not be able to
keep up and then you'd get fragmented maps, even when they could be
merged in theory. However, from what you have described so far, I do
not think this is something you'd be running into. So something seems
strange here.
Olivier> Also, I started printing the addresses returned by malloc.
Olivier> I map the files starting at 0x6000000300000000. The
Olivier> addresses for malloc are not getting anywhere close to that
Olivier> limit. However, something very peculiar happens. Just
Olivier> before the failed malloc, the last 3 or 4 successful
Olivier> mallocs return addresses in the 0x2000000000000000 range
Olivier> which is region 1 and should(?) be reserved for shared
Olivier> memory (according to the figure on page 149 or your ia64
Olivier> kernel book).
It's true that region 1 is used for shared memory, but it's also use
for any mmap() for which you don't specify a mapping address, so this
by itself doesn't look suspicious.
Olivier> Here is the output of free just before it runs out of
Olivier> memory total used free shared buffers cached Mem: 952576
Olivier> 948448 4128 0 656 836416 -/+ buffers/cache: 111376 841200
Olivier> Swap: 1542176 650992 891184
Nothing obviously wrong here. Clearly it's not an out-of-memory
situation.
Please provide a minimal test program that reproduces the problem.
Thanks,
--david
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [Linux-ia64] mmap and malloc questions on IA-64 linux
2002-08-01 16:47 [Linux-ia64] mmap and malloc questions on IA-64 linux Olivier, JeffreyX
` (2 preceding siblings ...)
2002-08-02 21:08 ` David Mosberger
@ 2002-08-05 15:40 ` Olivier, JeffreyX
2002-08-05 20:31 ` David Mosberger
2002-08-05 21:01 ` Matthew Wilcox
5 siblings, 0 replies; 7+ messages in thread
From: Olivier, JeffreyX @ 2002-08-05 15:40 UTC (permalink / raw)
To: linux-ia64
>Yes, normally there is one entry per mapped file. If you use munmap()
>and mmap() a lot, the merging-logic in the kernel may not be able to
>keep up and then you'd get fragmented maps, even when they could be
>merged in theory. However, from what you have described so far, I do
>not think this is something you'd be running into. So something seems
>strange here.
Your suggestion that munmap() might fragment the map reminded me of
something else. Upon receiving a write notice from another node for a page,
our system uses mprotect on the page so that we can't write to it without
causing a segmentation fault. Since we do this on a per-page basis, this is
likely the cause of the multiple mappings and since we are doing this over
such a large address space, it is likely that the linux default of 65536
memory maps is the problem. Does that seem reasonable?
I can see two possible solutions:
1. Develop an algorithm to efficiently combine memory mappings with the
same protections. This would be fairly straight forward for my program but
for more sporadic memory accesses, it might not work very well.
2. Change the default maximum number of mappings. I noticed a
/proc/sys/vm/max_map_count variable. Can this be increased safely?
Thanks for the help!
-Jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [Linux-ia64] mmap and malloc questions on IA-64 linux
2002-08-01 16:47 [Linux-ia64] mmap and malloc questions on IA-64 linux Olivier, JeffreyX
` (3 preceding siblings ...)
2002-08-05 15:40 ` Olivier, JeffreyX
@ 2002-08-05 20:31 ` David Mosberger
2002-08-05 21:01 ` Matthew Wilcox
5 siblings, 0 replies; 7+ messages in thread
From: David Mosberger @ 2002-08-05 20:31 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 5 Aug 2002 08:40:17 -0700 , "Olivier, JeffreyX" <jeffreyx.olivier@intel.com> said:
Jeff> Your suggestion that munmap() might fragment the map reminded
Jeff> me of something else. Upon receiving a write notice from
Jeff> another node for a page, our system uses mprotect on the page
Jeff> so that we can't write to it without causing a segmentation
Jeff> fault. Since we do this on a per-page basis, this is likely
Jeff> the cause of the multiple mappings and since we are doing this
Jeff> over such a large address space, it is likely that the linux
Jeff> default of 65536 memory maps is the problem. Does that seem
Jeff> reasonable?
Yes, indeed.
Jeff> I can see two possible solutions:
Jeff> 1. Develop an algorithm to efficiently combine memory
Jeff> mappings with the same protections. This would be fairly
Jeff> straight forward for my program but for more sporadic memory
Jeff> accesses, it might not work very well.
That's what the Linux kernel did up to the pre-2.4.0 series of
patches. Then Linus ripped the merging logic out and replaced it with
something simpler. The old merging logic had some nasty SMP issues,
IIRC.
It's something that should be discussed on the general linux-kernel
mailing list (linux-kernel@vger.kernel.org), as this is not specific
to ia64.
Jeff> 2. Change the default maximum number of mappings. I noticed
Jeff> a /proc/sys/vm/max_map_count variable. Can this be increased
Jeff> safely?
In the standard 2.4.18 kernel, MAX_MAP_COUNT is a hardcoded constant.
But in either case, AFAIK, the main reason for the existence of the
limit is to have (some) protection against denial-of-service attacks,
where a single process would consume huge amounts of kernel memory.
There is no a-prio limit in the kernel which would prevent you from
making the number as large as you want (well, within reason: map_count
is a signed 32-bit variable...).
--david
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Linux-ia64] mmap and malloc questions on IA-64 linux
2002-08-01 16:47 [Linux-ia64] mmap and malloc questions on IA-64 linux Olivier, JeffreyX
` (4 preceding siblings ...)
2002-08-05 20:31 ` David Mosberger
@ 2002-08-05 21:01 ` Matthew Wilcox
5 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2002-08-05 21:01 UTC (permalink / raw)
To: linux-ia64
On Mon, Aug 05, 2002 at 01:31:00PM -0700, David Mosberger wrote:
> That's what the Linux kernel did up to the pre-2.4.0 series of
> patches. Then Linus ripped the merging logic out and replaced it with
> something simpler. The old merging logic had some nasty SMP issues,
> IIRC.
Yeah, I think there was a lock inversion ...
> In the standard 2.4.18 kernel, MAX_MAP_COUNT is a hardcoded constant.
> But in either case, AFAIK, the main reason for the existence of the
> limit is to have (some) protection against denial-of-service attacks,
> where a single process would consume huge amounts of kernel memory.
> There is no a-prio limit in the kernel which would prevent you from
> making the number as large as you want (well, within reason: map_count
> is a signed 32-bit variable...).
Well... kind of. The VMA management algorithms are a RB tree which scales
better than a linked list, but is still not going to perform terribly
well when we get hundreds of thousands of VMAs. BTW, the find_vma_prev
routine which ia64 uses in its fault handler path could be sped up by
using the one we have in the PA-RISC tree that was merged into 2.5 --
patch was posted to linux-mm:
http://mail.nl.linux.org/linux-mm/2002-06/msg00062.html
--
Revolutions do not require corporate support.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2002-08-05 21:01 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-01 16:47 [Linux-ia64] mmap and malloc questions on IA-64 linux Olivier, JeffreyX
2002-08-01 18:09 ` David Mosberger
2002-08-02 15:25 ` Olivier, JeffreyX
2002-08-02 21:08 ` David Mosberger
2002-08-05 15:40 ` Olivier, JeffreyX
2002-08-05 20:31 ` David Mosberger
2002-08-05 21:01 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox