* Re: RAM consumption when working with the gcc repo
2007-12-07 20:46 ` Nicolas Pitre
@ 2007-12-07 21:23 ` Jon Smirl
2007-12-07 21:25 ` Marco Costalba
` (4 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Jon Smirl @ 2007-12-07 21:23 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: david, Git Mailing List
On 12/7/07, Nicolas Pitre <nico@cam.org> wrote:
> On Fri, 7 Dec 2007, david@lang.hm wrote:
>
> > On Fri, 7 Dec 2007, Jon Smirl wrote:
> >
> > > I noticed two things when doing a repack of the gcc repo. First is
> > > that the git process is getting to be way too big. Turning off the
> > > delta caches had minimal impact. Why does the process still grow to
> > > 4.8GB?
> > >
> > > Putting this in perspective, this is a 4.8GB process constructing a
> > > 330MB file. Something isn't right. Memory leak or inefficient data
> > > structure?
> >
> > keep in mind that that 330MB file is _very_ heavily compressed. the simple
> > zlib compression is probably getting you 10:1 or 20:1 compression and the
> > delta compression is a significant multiplier on top of that.
>
> Doesn't matter. Something is indeed fishy.
>
> The bulk of pack-objects memory consumption can be estimated as follows:
>
> 1M objects * sizeof(struct object_entry) ~= 100MB
> 256 window entries with data (assuming a big 1MB per entry) = 256MB
> Delta result caching was disabled therefore 0MB
> read-side delta cache limited to 16MB
>
> So the purely ram allocation might get to roughly 400MB.
>
> Then add the pack and index map, which, depending on the original pack
> size,
> might be 2GB.
I'm repacking the heavily compress pack, so input pack and index are
about 360MB, not 2GB.
>
> So we're pessimistically talking of about 2.5GB of virtual space.
>
> The other 2.3GB is hard to explain.
More like 3.5MB that is hard to explain.
Is there a simple way to tell what percent is mmap vs anon allocation?
>
>
> Nicolas
>
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAM consumption when working with the gcc repo
2007-12-07 20:46 ` Nicolas Pitre
2007-12-07 21:23 ` Jon Smirl
@ 2007-12-07 21:25 ` Marco Costalba
2007-12-08 11:54 ` Johannes Schindelin
2007-12-07 21:27 ` Jon Smirl
` (3 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Marco Costalba @ 2007-12-07 21:25 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: david, Jon Smirl, Git Mailing List
On Dec 7, 2007 9:46 PM, Nicolas Pitre <nico@cam.org> wrote:
>
> The other 2.3GB is hard to explain.
>
BTW does exist a tool to profile memory consumption by each source
level struct / vector/ or any other data container?
Valgrind checks mainly memory leaks, callgrind gives profiling
information in terms of call graphs and times/cycles consumed by each
function.
What I _really_ would like it's a tool that allows me to *easily*
check how much memory is used in a given point in time by each data
container at source level.
Something like this:
At checkpoint "trigger_now":
struct my_data is instantiated 120234 times
struct super_delta is instantiated 100000 times
At checkpoint "trigger_also_now":
struct my_data is instantiated 12 times
struct super_delta is instantiated 70 times
.....
That would be AWSOME!!! a super debugging killing tool!
Thanks
Marco
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAM consumption when working with the gcc repo
2007-12-07 21:25 ` Marco Costalba
@ 2007-12-08 11:54 ` Johannes Schindelin
2007-12-08 19:12 ` Marco Costalba
0 siblings, 1 reply; 15+ messages in thread
From: Johannes Schindelin @ 2007-12-08 11:54 UTC (permalink / raw)
To: Marco Costalba; +Cc: Nicolas Pitre, david, Jon Smirl, Git Mailing List
Hi,
On Fri, 7 Dec 2007, Marco Costalba wrote:
> On Dec 7, 2007 9:46 PM, Nicolas Pitre <nico@cam.org> wrote:
> >
> > The other 2.3GB is hard to explain.
>
> BTW does exist a tool to profile memory consumption by each source level
> struct / vector/ or any other data container?
>
> Valgrind checks mainly memory leaks, callgrind gives profiling
> information in terms of call graphs and times/cycles consumed by each
> function.
Have you looked at Massif (also part of Valgrind)?
Ciao,
Dscho
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAM consumption when working with the gcc repo
2007-12-08 11:54 ` Johannes Schindelin
@ 2007-12-08 19:12 ` Marco Costalba
0 siblings, 0 replies; 15+ messages in thread
From: Marco Costalba @ 2007-12-08 19:12 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Nicolas Pitre, david, Jon Smirl, Git Mailing List
On Dec 8, 2007 12:54 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> >
> > BTW does exist a tool to profile memory consumption by each source level
> > struct / vector/ or any other data container?
> >
> > Valgrind checks mainly memory leaks, callgrind gives profiling
> > information in terms of call graphs and times/cycles consumed by each
> > function.
>
> Have you looked at Massif (also part of Valgrind)?
>
Only very quickly, so probably I've missed something, but anyway
that's my comment on Massiv:
- Interesting output is in html format, graph is nice but gives very
general info.
- The tool mainly tracks who called malloc and friends also going back
in the stack frame (in my box if I try to set stack deep at 4 instead
of default 3 program crashes)
- Does not seem to give information regarding the structures where
memory is allocated, only the function names that allocate directly or
indirectly the memory. Nothing like
struct my_data has 34.256 instantiations
struct stuff has 2.456 instantiations
- Relation between memory and time of allocation is IMHO a little bit
confusing, it's like a parameter obtained from multipling allocated
bytes x time, not very useful.
- Regarding the previous point, it seems missing a way to
trigger/snapshotting the memory map in terms of source level
structures used by the application in a given time, triggable through
code.
Just to be clear, the much better callgrind tool allows to insert in
the code the macros
CALLGRIND_START_INSTRUMENTATION and
CALLGRIND_STOP_INSTRUMENTATION
that start/stop recording of events in the code ranges specified by
the developer. What it seems to be missing in massif it's a macro like
MASSIV_SNAPSHOT_MEM_MAP
to be used _where_ developer needs and that gives information on heap
allocation in terms of source level entities that *use* that memory,
not low level addresses or allocator functions.
Thanks
Marco
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAM consumption when working with the gcc repo
2007-12-07 20:46 ` Nicolas Pitre
2007-12-07 21:23 ` Jon Smirl
2007-12-07 21:25 ` Marco Costalba
@ 2007-12-07 21:27 ` Jon Smirl
2007-12-07 21:39 ` Jon Smirl
` (2 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Jon Smirl @ 2007-12-07 21:27 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: david, Git Mailing List
On 12/7/07, Nicolas Pitre <nico@cam.org> wrote:
> On Fri, 7 Dec 2007, david@lang.hm wrote:
>
> > On Fri, 7 Dec 2007, Jon Smirl wrote:
> >
> > > I noticed two things when doing a repack of the gcc repo. First is
> > > that the git process is getting to be way too big. Turning off the
> > > delta caches had minimal impact. Why does the process still grow to
> > > 4.8GB?
> > >
> > > Putting this in perspective, this is a 4.8GB process constructing a
> > > 330MB file. Something isn't right. Memory leak or inefficient data
> > > structure?
> >
> > keep in mind that that 330MB file is _very_ heavily compressed. the simple
> > zlib compression is probably getting you 10:1 or 20:1 compression and the
> > delta compression is a significant multiplier on top of that.
>
> Doesn't matter. Something is indeed fishy.
I didn't have any problem repacking Mozilla and it ends up as a 450MB
pack file with 1.5M entries. So something has changed. With Mozilla I
had a 3GB machine, and now I can't finish a 330MB pack on a 4GB
machine. I don't recall the Mozilla process ever exceeding 2GB.
>
> The bulk of pack-objects memory consumption can be estimated as follows:
>
> 1M objects * sizeof(struct object_entry) ~= 100MB
> 256 window entries with data (assuming a big 1MB per entry) = 256MB
> Delta result caching was disabled therefore 0MB
> read-side delta cache limited to 16MB
>
> So the purely ram allocation might get to roughly 400MB.
>
> Then add the pack and index map, which, depending on the original pack
> size,
> might be 2GB.
>
> So we're pessimistically talking of about 2.5GB of virtual space.
>
> The other 2.3GB is hard to explain.
>
>
> Nicolas
>
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAM consumption when working with the gcc repo
2007-12-07 20:46 ` Nicolas Pitre
` (2 preceding siblings ...)
2007-12-07 21:27 ` Jon Smirl
@ 2007-12-07 21:39 ` Jon Smirl
2007-12-07 21:50 ` Jon Smirl
2007-12-08 17:24 ` Martin Koegler
5 siblings, 0 replies; 15+ messages in thread
From: Jon Smirl @ 2007-12-07 21:39 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: david, Git Mailing List
Here's a big clue.
When I repack the 300MB file the process grows to 4.8GB
When I repack the 2,000MB file the process grows to 3.3GB
In both cases the last 10% of the repack is taking as much time as the
first 90%.
At the end I am packing 60 objects/sec. In the beginning i was packing
1,000s of objects per second.
I'm not swapping
jonsmirl@terra:/video/gcc/.git/objects/pack/foo$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 1416 25668 3904 2756404 0 0 62 45 115 398 6 0 93 1
3 0 1416 26880 3900 2754852 0 0 0 0 414 2453 26 1 73 0
2 0 1416 26880 3900 2754852 0 0 0 0 472 3518 26 1 73 0
4 0 1416 26912 3900 2754768 0 0 0 0 394 1642 26 1 74 0
2 0 1416 26912 3900 2754768 0 0 0 0 401 1364 25 0 75 0
2 0 1416 26896 3900 2754768 0 0 0 0 456 1922 25 1 75 0
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAM consumption when working with the gcc repo
2007-12-07 20:46 ` Nicolas Pitre
` (3 preceding siblings ...)
2007-12-07 21:39 ` Jon Smirl
@ 2007-12-07 21:50 ` Jon Smirl
2007-12-08 17:24 ` Martin Koegler
5 siblings, 0 replies; 15+ messages in thread
From: Jon Smirl @ 2007-12-07 21:50 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: david, Git Mailing List
This is for a 3.3GB process with the 2GB pack as input
Looking at my process map, why is the pack file in the map four times?
2ba1f703b000-2ba23703b000 r--p 00000000 09:01 33079321
/video/gcc/.git/objects/pack/pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack
2ba23703b000-2ba23703c000 rw-p 2ba23703b000 00:00 0
2ba237c86000-2ba239352000 r--p 80000000 09:01 33079321
/video/gcc/.git/objects/pack/pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack
2ba2394b1000-2ba2794b1000 r--p 40000000 09:01 33079321
/video/gcc/.git/objects/pack/pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack
2ba2794b1000-2ba27a4b2000 rw-p 2ba2794b1000 00:00 0
2ba27bcb2000-2ba281c29000 rw-p 2ba23703c000 00:00 0
2ba281c29000-2ba2a32f5000 r--p 60000000 09:01 33079321
/video/gcc/.git/objects/pack/pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack
7fffb75e2000-7fffb75f7000 rw-p 7ffffffea000 00:00 0 [stack]
7fffb75fe000-7fffb7600000 r-xp 7fffb75fe000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
[vsyscall]
Here's the heap:
00400000-004b9000 r-xp 00000000 08:16 296588
/usr/local/bin/git
006b9000-006bd000 rw-p 000b9000 08:16 296588
/usr/local/bin/git
006bd000-0c17f000 rw-p 006bd000 00:00 0 [heap]
40000000-40001000 ---p 40000000 00:00 0
40001000-40801000 rw-p 40001000 00:00 0
40801000-40802000 ---p 40801000 00:00 0
40802000-41002000 rw-p 40802000 00:00 0
41002000-41003000 ---p 41002000 00:00 0
41003000-41803000 rw-p 41003000 00:00 0
41803000-41804000 ---p 41803000 00:00 0
41804000-42004000 rw-p 41804000 00:00 0
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAM consumption when working with the gcc repo
2007-12-07 20:46 ` Nicolas Pitre
` (4 preceding siblings ...)
2007-12-07 21:50 ` Jon Smirl
@ 2007-12-08 17:24 ` Martin Koegler
5 siblings, 0 replies; 15+ messages in thread
From: Martin Koegler @ 2007-12-08 17:24 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: david, Jon Smirl, Git Mailing List
On Fri, Dec 07, 2007 at 03:46:30PM -0500, Nicolas Pitre wrote:
> On Fri, 7 Dec 2007, david@lang.hm wrote:
> > On Fri, 7 Dec 2007, Jon Smirl wrote:
> >
> > > I noticed two things when doing a repack of the gcc repo. First is
> > > that the git process is getting to be way too big. Turning off the
> > > delta caches had minimal impact. Why does the process still grow to
> > > 4.8GB?
> > >
> > > Putting this in perspective, this is a 4.8GB process constructing a
> > > 330MB file. Something isn't right. Memory leak or inefficient data
> > > structure?
> >
> > keep in mind that that 330MB file is _very_ heavily compressed. the simple
> > zlib compression is probably getting you 10:1 or 20:1 compression and the
> > delta compression is a significant multiplier on top of that.
>
> Doesn't matter. Something is indeed fishy.
>
> The bulk of pack-objects memory consumption can be estimated as follows:
>
> 1M objects * sizeof(struct object_entry) ~= 100MB
> 256 window entries with data (assuming a big 1MB per entry) = 256MB
For each (uncompressed) object in the delta window, a delta index is
created. It can have the same size as the uncompressed object.
Each thread has its own window, so using 4 threads means having 1024 objects
in memory => 1 GB
> Delta result caching was disabled therefore 0MB
> read-side delta cache limited to 16MB
>
> So the purely ram allocation might get to roughly 400MB.
>
> Then add the pack and index map, which, depending on the original pack
> size,
> might be 2GB.
>
> So we're pessimistically talking of about 2.5GB of virtual space.
>
> The other 2.3GB is hard to explain.
mfg Martin Kögler
^ permalink raw reply [flat|nested] 15+ messages in thread