2GB process crashing on 2.4.14

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2GB process crashing on 2.4.14
@ 2001-12-07 12:58 Paul Sargent
  2001-12-07 13:16 ` Alan Cox
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Sargent @ 2001-12-07 12:58 UTC (permalink / raw)
  To: linux-kernel

Hi People,

I looking for some theories for a problem I've got on a Linux 2.4.14 box.
Basically we're using this box for running large chip design jobs on. These
processes reglarly get up to the 1-2GB size. All has been fine, until
recently the jobs started getting a little larger and growing above the 2GB
boundary. At this point the process crashes reporting that it's unable to
obtain any more memory. The kernel messages are silent.

I'm trying to work out why the process doesn't start using swap. Swap
appears to be working, as all other processes have been swapped out. Is it
possible for a running process to be partially swapped out? Could it be that
the process is asking for memory which can't be swapped?

Does anybody who has knowledge of the VM system, have any ideas what could
be going on, or any debugging techniques which might allow me to get some
more info. I've included a dump of the /proc/<pid>/maps output when the
process is around the 1GB level. Hopefully it may give some clues. Please
bear in mind when offering debug tips, that the job has a run time of about
12 hours before it crashes, so quick spins aren't feasible.

I realise this could well not be a kernel issue, and just an app issue, but
I thought this would be the place where the knowledge would reside.

Thanks

Paul

P.S. I'll be monitoring the list from the Archives, but if people could cc
     me on replies that would make things easier.

-------------------------------------------------------

Configuration of the box:

AMD Athlon 1GHz
AMD 760 North Bridge
Via South Bridge

2GB Physical RAM (2x1GB DIMMs)
4GB Swap (2x2GB Partitions)

Linux Kernel 2.4.14 (Compiled with CONFIG_HIGHMEM4G, but not 64G)
libc6-2.2.4

-------------------------------------------------------

/proc/x/maps:

00002000-0140b000 r-xp 00000000 00:0c 4064305 /homes/magma_rel/blast2/2001_08_22.1047/linux22_x86/bin/mantle
0140b000-01c6a000 rw-p 01408000 00:0c 4064305 /homes/magma_rel/blast2/2001_08_22.1047/linux22_x86/bin/mantle
01c6a000-4b653000 rwxp 00000000 00:00 0
bca9a000-bd174000 rw-p 4792c000 00:00 0
bd1a5000-beb2e000 rw-p 480d1000 00:00 0
bec50000-bed89000 rw-p 0d88a000 00:00 0
bee5a000-bf050000 rw-p 0d696000 00:00 0
bf12b000-bf4b8000 rw-p 0d95f000 00:00 0
bf4c0000-bf658000 rw-p 0dcf4000 00:00 0
bf658000-bf661000 r-xp 00000000 03:02 1047       /lib/libnss_nis-2.2.4.so
bf661000-bf662000 rw-p 00008000 03:02 1047       /lib/libnss_nis-2.2.4.so
bf662000-bf673000 r-xp 00000000 03:02 1040       /lib/libnsl-2.2.4.so
bf673000-bf675000 rw-p 00010000 03:02 1040       /lib/libnsl-2.2.4.so
bf675000-bf677000 rw-p 00000000 00:00 0
bf677000-bf680000 r-xp 00000000 03:02 1048       /lib/libnss_nisplus-2.2.4.so
bf680000-bf682000 rw-p 00008000 03:02 1048       /lib/libnss_nisplus-2.2.4.so
bf682000-bf696000 r-xp 00000000 03:02 989        /lib/ld-2.2.4.so
bf696000-bf697000 rw-p 00013000 03:02 989        /lib/ld-2.2.4.so
bf697000-bf698000 rw-p 00000000 00:00 0
bf698000-bf7b0000 r-xp 00000000 03:02 1003       /lib/libc-2.2.4.so
bf7b0000-bf7b6000 rw-p 00117000 03:02 1003       /lib/libc-2.2.4.so
bf7b6000-bf7ba000 rw-p 00000000 00:00 0
bf7ba000-bf7c2000 r-xp 00000000 03:02 1045       /lib/libnss_files-2.2.4.so
bf7c2000-bf7c4000 rw-p 00007000 03:02 1045       /lib/libnss_files-2.2.4.so
bf7c7000-bf7ff000 rw-p 00014000 00:00 0
bffe5000-c0000000 rwxp fffe6000 00:00 0

-------------------------------------------------------

Snippets of dmesg:

Linux version 2.4.14 (root@gringo) (gcc version 2.95.4 20011006 (Debian prerelease)) #1 Thu Nov 8 13:01:18 GMT 2001
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
 BIOS-e820: 000000007fff0000 - 000000007fff3000 (ACPI NVS)
 BIOS-e820: 000000007fff3000 - 0000000080000000 (ACPI data)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
1151MB HIGHMEM available.
On node 0 totalpages: 524272
zone(0): 4096 pages.
zone(1): 225280 pages.
zone(2): 294896 pages.
Kernel command line: auto BOOT_IMAGE=Linux ro root=302
Memory: 2061972k/2097088k available (1133k kernel code, 34732k reserved, 327k data, 200k init, 1179584k highmem)
Dentry-cache hash table entries: 262144 (order: 9, 2097152 bytes)
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
Starting kswapd
allocated 32 pages and 32 bhs reserved for the highmem bounces

-------------------------------------------------------

-- 
Paul Sargent
mailto: Paul.Sargent@3Dlabs.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2GB process crashing on 2.4.14
  2001-12-07 12:58 2GB process crashing on 2.4.14 Paul Sargent
@ 2001-12-07 13:16 ` Alan Cox
  2001-12-07 13:23   ` Paul Sargent
  0 siblings, 1 reply; 7+ messages in thread
From: Alan Cox @ 2001-12-07 13:16 UTC (permalink / raw)
  To: Paul Sargent; +Cc: linux-kernel

> I'm trying to work out why the process doesn't start using swap. Swap
> appears to be working, as all other processes have been swapped out. Is it
> possible for a running process to be partially swapped out? Could it be that
> the process is asking for memory which can't be swapped?

Most probably the process is running out of address space to allocate from.
There is 3Gb of available space. Of that some is your stack, some your
binary, some your libraries.  Getting above 3Gb/process on x86 is very hairy
with a bad performance hit

Alan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2GB process crashing on 2.4.14
  2001-12-07 13:16 ` Alan Cox
@ 2001-12-07 13:23   ` Paul Sargent
  2001-12-07 13:48     ` Alan Cox
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Sargent @ 2001-12-07 13:23 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Fri, Dec 07, 2001 at 01:16:49PM +0000, Alan Cox wrote:
> Most probably the process is running out of address space to allocate from.
> There is 3Gb of available space. 

That would be from 0x00000000 to 0xC0000000, Right?

> Of that some is your stack, some your
> binary, some your libraries.  Getting above 3Gb/process on x86 is very hairy
> with a bad performance hit

So if I was hitting this limit then I should see no / very few gaps, in the
/proc/<pid>/maps. Is that true?

Paul

-- 
Paul Sargent
Tel: +44 (1784) 476669
Fax: +44 (1784) 470699
mailto: Paul.Sargent@3Dlabs.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2GB process crashing on 2.4.14
  2001-12-07 13:23   ` Paul Sargent
@ 2001-12-07 13:48     ` Alan Cox
  2001-12-07 14:01       ` Paul Sargent
  0 siblings, 1 reply; 7+ messages in thread
From: Alan Cox @ 2001-12-07 13:48 UTC (permalink / raw)
  To: Paul Sargent; +Cc: Alan Cox, linux-kernel

> > Most probably the process is running out of address space to allocate from.
> > There is 3Gb of available space. 
> 
> That would be from 0x00000000 to 0xC0000000, Right?

Correct (0xBFFFFFFF)

> > binary, some your libraries.  Getting above 3Gb/process on x86 is very hairy
> > with a bad performance hit
> 
> So if I was hitting this limit then I should see no / very few gaps, in the
> /proc/<pid>/maps. Is that true?

Providing the memory allocator it is using is sufficiently smart

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2GB process crashing on 2.4.14
  2001-12-07 13:48     ` Alan Cox
@ 2001-12-07 14:01       ` Paul Sargent
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Sargent @ 2001-12-07 14:01 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Fri, Dec 07, 2001 at 01:48:00PM +0000, Alan Cox wrote:
> > > Most probably the process is running out of address space to allocate from.
> > > There is 3Gb of available space. 
> > 
> > That would be from 0x00000000 to 0xC0000000, Right?
> 
> Correct (0xBFFFFFFF)
> 
> > > binary, some your libraries.  Getting above 3Gb/process on x86 is very hairy
> > > with a bad performance hit
> > 
> > So if I was hitting this limit then I should see no / very few gaps, in the
> > /proc/<pid>/maps. Is that true?
> 
> Providing the memory allocator it is using is sufficiently smart

Where "it" is the app?

OK, well looking at the maps output, there seems to be three distinct
sections:

1) from 0x00000000 to 0x01c6a000 (30MB-ish) are mappings of the executable.

2) from 0xbca9a000 to 0xbfffffff (56MB-ish) are the libs, plus a few other
   areas, which I've assumed are stack, and scratch areas for the libs.

3) a single mapping, (was 1.1GB-ish in the map output I attached) which
   starts at the end of section 1, and is continually growing, and which I
   can see has no reason to stop until it gets to the start of section 2
   (some 3GB - 86MB later).

Now admittedly, it's possible that some of the other mappings may grow by a
factor of 20 to suddenly eat up 1GB of address space, but I doubt it. So I'm
not buying the address space idea at the moment. That said, I'm not going to
discount it and will keep a log of what happens on the mappings while this
process is running, just in case something really wacky like that happens.

Paul
-- 
Paul Sargent
mailto: Paul.Sargent@3Dlabs.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <20011207125821.D31161@3dlabs.com.suse.lists.linux.kernel>]

[parent not found: <E16CKrx-0005nL-00@the-village.bc.nu.suse.lists.linux.kernel>]

[parent not found: <20011207132317.E31161@3dlabs.com.suse.lists.linux.kernel>]

* Re: 2GB process crashing on 2.4.14
       [not found]   ` <20011207132317.E31161@3dlabs.com.suse.lists.linux.kernel>
@ 2001-12-07 14:02     ` Andi Kleen
  2001-12-07 14:15       ` Paul Sargent
  0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2001-12-07 14:02 UTC (permalink / raw)
  To: Paul Sargent; +Cc: linux-kernel

Paul Sargent <Paul.Sargent@3dlabs.com> writes:

> So if I was hitting this limit then I should see no / very few gaps, in the
> /proc/<pid>/maps. Is that true?

It usually fails when malloc() hits your libraries. One solution is to
recompile the kernel with a higher TASK_UNMAPPED_BASE (should be a sysctl,
but is a fixed define currently). That would force the shared libraries 
to start at a higher address and give sbrk() more breathing space.

Another way is to use mallopt(M_MMAP_THRESHOLD, ..)  and set a low mmap
threshold. This allows malloc to use mmap earlier instead of sbrk() and skip
the shared library area.  It comes at a cost thought, malloc() tends to 
become more expensive.

With some more changes you can also force the user space to 3.5GB, at the
cost of much less kernel memory. It usually makes sense to change 
TASK_UNMAPPED_BASE with this.

-Andi

P.S.: I'm pretty sure this is a FAQ.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2GB process crashing on 2.4.14
  2001-12-07 14:02     ` Andi Kleen
@ 2001-12-07 14:15       ` Paul Sargent
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Sargent @ 2001-12-07 14:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

On Fri, Dec 07, 2001 at 03:02:46PM +0100, Andi Kleen wrote:
> Paul Sargent <Paul.Sargent@3dlabs.com> writes:
> 
> 
> > So if I was hitting this limit then I should see no / very few gaps, in the
> > /proc/<pid>/maps. Is that true?
> 
> It usually fails when malloc() hits your libraries.

[remedies to increase available address space snipped]

> -Andi
> 
> P.S.: I'm pretty sure this is a FAQ.

Yes, I think your right, this is. At least, I've come across it before, but
I'm not convinced this is my problem in this case.

I think it's failing about 1GB before getting to the bottom of the
libraries, but I'm going to monitor maps over the weekend to ensure it
doesn't start doing anything wacky just before crashing.

Paul
-- 
Paul Sargent
Tel: +44 (1784) 476669
Fax: +44 (1784) 470699
mailto: Paul.Sargent@3Dlabs.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-12-07 14:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-07 12:58 2GB process crashing on 2.4.14 Paul Sargent
2001-12-07 13:16 ` Alan Cox
2001-12-07 13:23   ` Paul Sargent
2001-12-07 13:48     ` Alan Cox
2001-12-07 14:01       ` Paul Sargent
     [not found] <20011207125821.D31161@3dlabs.com.suse.lists.linux.kernel>
     [not found] ` <E16CKrx-0005nL-00@the-village.bc.nu.suse.lists.linux.kernel>
     [not found]   ` <20011207132317.E31161@3dlabs.com.suse.lists.linux.kernel>
2001-12-07 14:02     ` Andi Kleen
2001-12-07 14:15       ` Paul Sargent

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox