git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	gcc@gcc.gnu.org,  Git Mailing List <git@vger.kernel.org>
Subject: Re: Something is broken in repack
Date: Wed, 12 Dec 2007 10:48:12 -0500 (EST)	[thread overview]
Message-ID: <alpine.LFD.0.99999.0712120743040.555@xanadu.home> (raw)
In-Reply-To: <alpine.LFD.0.99999.0712112057390.555@xanadu.home>

On Wed, 12 Dec 2007, Nicolas Pitre wrote:

> Add memory fragmentation to that and you have a clogged system.
> 
> Solution: 
> 
> 	pack.deltacachesize=1
> 	pack.windowmemory=16M
> 
> Limiting the window memory to 16MB will automatically shrink the window 
> size when big objects are encountered, therefore keeping much fewer of 
> those objects at the same time in memory, which in turn means they will 
> be processed much more quickly.  And somehow that must help with memory 
> fragmentation as well.

OK scrap that.

When I returned to the computer this morning, the repack was 
completed... with a 1.3GB pack instead.

So... The gcc repo apparently really needs a large window to efficiently 
compress those large objects.

But when those large objects are already well deltified and you repack 
again with a large window, somehow the memory allocator is way more 
involved, probably even 
more so when there are several threads in parallel amplifying the issue, 
and things probably get to a point of no return with regard to memory 
fragmentation after a while.

So... my conclusion is that the glibc allocator has fragmentation issues 
with this work load, given the notable difference with the Google 
allocator, which itself might not be completely immune to fragmentation 
issues of its own.  And because the gcc repo requires a large window of 
big objects to get good compression, then you're better not using 4 
threads to repack it with -a -f.  The fact that the size of the source 
pack has such an influence is probably only because the increased usage 
of the delta base object cache is playing a role in the global memory 
allocation pattern, allowing for the bad fragmentation issue to occur.

If you could run one last test with the mallinfo patch I posted, without 
the pack.windowmemory setting, and adding the reported values along with 
those from top, then we could formally conclude to memory fragmentation 
issues.

So I don't think Git itself is actually bad.  The gcc repo most 
certainly constitute a nasty use case for memory allocators, but I don't 
think there is much we can do about it besides possibly implementing our 
own memory allocator with active defragmentation where possible (read 
memcpy) at some point to give glibc's allocator some chance to breathe a 
bit more.

In the mean time you might have to use only one thread and lots of 
memory to repack the gcc repo, or find the perfect memory allocator to 
be used with Git.  After all, packing the whole gcc history to around 
230MB is quite a stunt but it requires sufficient resources to 
achieve it. Fortunately, like Linus said, such a wholesale repack is not 
something that most users have to do anyway.


Nicolas

  parent reply	other threads:[~2007-12-12 15:49 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-07 23:05 Something is broken in repack Jon Smirl
2007-12-08  0:37 ` Linus Torvalds
2007-12-08  1:27   ` [PATCH] pack-objects: fix delta cache size accounting Nicolas Pitre
2007-12-08  1:46 ` Something is broken in repack Nicolas Pitre
2007-12-08  2:04   ` Jon Smirl
2007-12-08  2:28     ` Nicolas Pitre
2007-12-08  3:29       ` Jon Smirl
2007-12-08  3:37         ` David Brown
2007-12-08  4:22           ` Jon Smirl
2007-12-08  4:30             ` Nicolas Pitre
2007-12-08  5:01               ` Jon Smirl
2007-12-08  5:12                 ` Nicolas Pitre
2007-12-08  3:48         ` Harvey Harrison
2007-12-08  2:22   ` Jon Smirl
2007-12-08  3:44   ` Harvey Harrison
2007-12-08 22:18   ` Junio C Hamano
2007-12-09  8:05     ` Junio C Hamano
2007-12-09 15:19       ` Jon Smirl
2007-12-09 18:25       ` Jon Smirl
2007-12-10  1:07         ` Nicolas Pitre
2007-12-10  2:49     ` Nicolas Pitre
2007-12-08  2:56 ` David Brown
2007-12-10 19:56 ` Nicolas Pitre
2007-12-10 20:05   ` Jon Smirl
2007-12-10 20:16     ` Morten Welinder
2007-12-11  2:25 ` Jon Smirl
2007-12-11  2:55   ` Junio C Hamano
2007-12-11  3:27     ` Nicolas Pitre
2007-12-11 11:08       ` David Kastrup
2007-12-11 12:08         ` Pierre Habouzit
2007-12-11 12:18           ` David Kastrup
2007-12-11  3:49   ` Nicolas Pitre
2007-12-11  5:25     ` Jon Smirl
2007-12-11  5:29       ` Jon Smirl
2007-12-11  7:01         ` Jon Smirl
2007-12-11  7:34           ` Andreas Ericsson
2007-12-11 13:49           ` Nicolas Pitre
2007-12-11 15:00             ` Nicolas Pitre
2007-12-11 15:36               ` Jon Smirl
2007-12-11 16:20               ` Nicolas Pitre
2007-12-11 16:21                 ` Jon Smirl
2007-12-12  5:12                   ` Nicolas Pitre
2007-12-12  8:05                     ` David Kastrup
2007-12-14 16:18                       ` Wolfram Gloger
2007-12-12 15:48                     ` Nicolas Pitre [this message]
2007-12-12 16:17                       ` Paolo Bonzini
2007-12-12 16:37                       ` Linus Torvalds
2007-12-12 16:42                         ` David Miller
2007-12-12 16:54                           ` Linus Torvalds
2007-12-12 17:12                         ` Jon Smirl
2007-12-14 16:12                         ` Wolfram Gloger
2007-12-14 16:45                           ` David Kastrup
2007-12-14 16:59                             ` Wolfram Gloger
2007-12-13 13:32                       ` Nguyen Thai Ngoc Duy
2007-12-13 15:32                         ` Paolo Bonzini
2007-12-13 16:29                           ` Paolo Bonzini
2007-12-13 16:39                           ` Johannes Sixt
2007-12-14  1:04                             ` Jakub Narebski
2007-12-14  6:14                               ` Paolo Bonzini
2007-12-14  6:24                                 ` Nguyen Thai Ngoc Duy
2007-12-14  8:20                                   ` Paolo Bonzini
2007-12-14  9:01                                     ` Harvey Harrison
2007-12-14 10:40                                   ` Jakub Narebski
2007-12-14 10:52                                     ` Nguyen Thai Ngoc Duy
2007-12-14 13:25                                 ` Nicolas Pitre
2007-12-12 16:13                     ` Nicolas Pitre
2007-12-13  7:32                       ` Andreas Ericsson
2007-12-14 16:03                         ` Wolfram Gloger
2007-12-11 16:33           ` Linus Torvalds
2007-12-11 17:21             ` Nicolas Pitre
2007-12-11 17:24               ` David Miller
2007-12-11 17:44                 ` Nicolas Pitre
2007-12-11 20:26                   ` Andreas Ericsson
2007-12-11 18:43             ` Jon Smirl
2007-12-11 18:57               ` Nicolas Pitre
2007-12-11 19:17               ` Linus Torvalds
2007-12-11 19:40                 ` Junio C Hamano
2007-12-11 20:34                   ` Andreas Ericsson
2007-12-11 17:28           ` Daniel Berlin
2007-12-11 13:31         ` Nicolas Pitre
2007-12-11  6:01       ` Sean
2007-12-11  6:20         ` Jon Smirl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.0.99999.0712120743040.555@xanadu.home \
    --to=nico@cam.org \
    --cc=gcc@gcc.gnu.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonsmirl@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).