git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* dmalloc and leaks in git
@ 2007-12-08 20:53 Jon Smirl
  2007-12-08 20:58 ` Johannes Schindelin
  2007-12-09 20:57 ` Linus Torvalds
  0 siblings, 2 replies; 7+ messages in thread
From: Jon Smirl @ 2007-12-08 20:53 UTC (permalink / raw)
  To: Git Mailing List

It is very easy to use dmalloc with git. Follow the instructions here,
http://dmalloc.com/docs/latest/online/dmalloc_4.html

But using dmalloc shows a pervasive problem, none of the git commands
are cleaning up after themselves. For example I ran a simple command,
git-status, and thousands of objects were not freed.

Normally this doesn't hurt since exiting the process obviously frees
all of the memory. But when programming this way it becomes impossible
to tell which leaks were on purpose and which were accidental.

To sort this out an #ifdef DMALLOC needs to be created and then code
for freeing all of the 'on purpose' leaks needs to be added in an
IFDEF right before the process exits. The test scripts can then be
modified to ensure that everything is freed when the command exits.

I've used this process on several projects I've managed and it is a
very good thing to do. Once the new infrastructure is in place leaks
can be detected automatically and nipped in the bud before they get
out of control. The key to making this work is getting code in place
in the #ifdef to free those "on-purpose" leaks.

I tried a couple of year ago to add leak detection to Mozilla but
Mozilla is way too far gone. There are 10,000 places where things are
allocated and not being freed. It is a huge manually intensive task
sorting out which of these were on-purpose vs accidental. If Mozilla
had followed a discipline of ensuring that nothing was every leaked
(by using the scheme above) a lot of recent leak clean up work could
have been avoided.

I don't know enough about the structure of git to add the cleanups in
#ifdefs before exit. People who wrote the commands are going to have
to help out with this.

diff --git a/Makefile b/Makefile
index 0a5df7a..426830c 100644
--- a/Makefile
+++ b/Makefile
@@ -752,7 +752,7 @@ SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
 PERL_PATH_SQ = $(subst ','\'',$(PERL_PATH))
 TCLTK_PATH_SQ = $(subst ','\'',$(TCLTK_PATH))

-LIBS = $(GITLIBS) $(EXTLIBS)
+LIBS = $(GITLIBS) $(EXTLIBS) -ldmalloc

 BASIC_CFLAGS += -DSHA1_HEADER='$(SHA1_HEADER_SQ)' \
 	$(COMPAT_CFLAGS)
diff --git a/git-compat-util.h b/git-compat-util.h
index 79eb10e..8894c30 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -428,3 +428,5 @@ static inline int strtol_i(char const *s, int
base, int *result)
 }

 #endif
+
+#include "dmalloc.h"

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: dmalloc and leaks in git
  2007-12-08 20:53 dmalloc and leaks in git Jon Smirl
@ 2007-12-08 20:58 ` Johannes Schindelin
  2007-12-08 21:02   ` Jon Smirl
  2007-12-09 20:57 ` Linus Torvalds
  1 sibling, 1 reply; 7+ messages in thread
From: Johannes Schindelin @ 2007-12-08 20:58 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List

Hi,

On Sat, 8 Dec 2007, Jon Smirl wrote:

> It is very easy to use dmalloc with git. Follow the instructions here,
> http://dmalloc.com/docs/latest/online/dmalloc_4.html
> 
> But using dmalloc shows a pervasive problem, none of the git commands
> are cleaning up after themselves. For example I ran a simple command,
> git-status, and thousands of objects were not freed.

Known problem.  Goes by the name of "libification" on this list.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dmalloc and leaks in git
  2007-12-08 20:58 ` Johannes Schindelin
@ 2007-12-08 21:02   ` Jon Smirl
  2007-12-08 21:19     ` Johannes Schindelin
  0 siblings, 1 reply; 7+ messages in thread
From: Jon Smirl @ 2007-12-08 21:02 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Git Mailing List

On 12/8/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Sat, 8 Dec 2007, Jon Smirl wrote:
>
> > It is very easy to use dmalloc with git. Follow the instructions here,
> > http://dmalloc.com/docs/latest/online/dmalloc_4.html
> >
> > But using dmalloc shows a pervasive problem, none of the git commands
> > are cleaning up after themselves. For example I ran a simple command,
> > git-status, and thousands of objects were not freed.
>
> Known problem.  Goes by the name of "libification" on this list.

I tried using dmalloc to find the leak in repack but it is impossible
to sort out the accidental leaks from the on-purpose ones. On exit
there were millions of unfreed objects coming from all over the place.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dmalloc and leaks in git
  2007-12-08 21:02   ` Jon Smirl
@ 2007-12-08 21:19     ` Johannes Schindelin
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Schindelin @ 2007-12-08 21:19 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List

Hi,

On Sat, 8 Dec 2007, Jon Smirl wrote:

> On 12/8/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > Hi,
> >
> > On Sat, 8 Dec 2007, Jon Smirl wrote:
> >
> > > It is very easy to use dmalloc with git. Follow the instructions here,
> > > http://dmalloc.com/docs/latest/online/dmalloc_4.html
> > >
> > > But using dmalloc shows a pervasive problem, none of the git commands
> > > are cleaning up after themselves. For example I ran a simple command,
> > > git-status, and thousands of objects were not freed.
> >
> > Known problem.  Goes by the name of "libification" on this list.
> 
> I tried using dmalloc to find the leak in repack but it is impossible
> to sort out the accidental leaks from the on-purpose ones. On exit
> there were millions of unfreed objects coming from all over the place.

This might be a starting point:

http://repo.or.cz/w/git/dscho.git?a=commitdiff;h=2083418c5010f04fbcd6e1f67de522ad6acd863d

Hth,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dmalloc and leaks in git
  2007-12-08 20:53 dmalloc and leaks in git Jon Smirl
  2007-12-08 20:58 ` Johannes Schindelin
@ 2007-12-09 20:57 ` Linus Torvalds
  2007-12-10 16:34   ` Linus Torvalds
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2007-12-09 20:57 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List



On Sat, 8 Dec 2007, Jon Smirl wrote:
> 
> But using dmalloc shows a pervasive problem, none of the git commands
> are cleaning up after themselves. For example I ran a simple command,
> git-status, and thousands of objects were not freed.

One thing to do is to use a better reporting tool.

For example, if you use

	valgrind --tool=massif --heap=yes ...

it will generate a postscript file with the allocation history as a graph 
of the different allocators in different colors etc. That would likely 
show where the big users come from..

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dmalloc and leaks in git
  2007-12-09 20:57 ` Linus Torvalds
@ 2007-12-10 16:34   ` Linus Torvalds
  2007-12-10 16:54     ` Nicolas Pitre
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2007-12-10 16:34 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List



On Sun, 9 Dec 2007, Linus Torvalds wrote:
> 
> For example, if you use
> 
> 	valgrind --tool=massif --heap=yes ...

I tried this on my copy of the gcc thing, but I didn't do the extreme 
packing thing, so I never saw the 3.4GB usage. Massif just reported a 200M 
heap, and about half of that was "add_object_entry".

Of course, that doesn't report any mmap usage at all, so it totally 
ignores the mapping of the original pack-file itself (which will obviously 
be totally dense by the end, since we look at all objects).

It also doesn't take into account various secondary effects. For example, 
I don't think it looks at heap fragmentation issues etc, which normally 
aren't a noticeable thing, but maybe some particular allocation pattern 
can make the glibc allocator waste horrid amounts of memory or something 
like that.

			Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dmalloc and leaks in git
  2007-12-10 16:34   ` Linus Torvalds
@ 2007-12-10 16:54     ` Nicolas Pitre
  0 siblings, 0 replies; 7+ messages in thread
From: Nicolas Pitre @ 2007-12-10 16:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jon Smirl, Git Mailing List

On Mon, 10 Dec 2007, Linus Torvalds wrote:

> 
> 
> On Sun, 9 Dec 2007, Linus Torvalds wrote:
> > 
> > For example, if you use
> > 
> > 	valgrind --tool=massif --heap=yes ...
> 
> I tried this on my copy of the gcc thing, but I didn't do the extreme 
> packing thing, so I never saw the 3.4GB usage. Massif just reported a 200M 
> heap, and about half of that was "add_object_entry".

So far, it seems that the problem occurs much more severely when you run 
'git repack -a -f' while using the already highly packed gcc repo as a 
starting point.

Remains to determine if it occurs only when the repack is threaded, or 
if that has no significance.


Nicolas

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-12-10 16:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-08 20:53 dmalloc and leaks in git Jon Smirl
2007-12-08 20:58 ` Johannes Schindelin
2007-12-08 21:02   ` Jon Smirl
2007-12-08 21:19     ` Johannes Schindelin
2007-12-09 20:57 ` Linus Torvalds
2007-12-10 16:34   ` Linus Torvalds
2007-12-10 16:54     ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).