git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Junio C Hamano <gitster@pobox.com>,
	gcc@gcc.gnu.org, Git Mailing List <git@vger.kernel.org>
Subject: Re: Something is broken in repack
Date: Tue, 11 Dec 2007 13:57:22 -0500 (EST)	[thread overview]
Message-ID: <alpine.LFD.0.99999.0712111349270.555@xanadu.home> (raw)
In-Reply-To: <9e4733910712111043h6a361996x740f4dba3d742da5@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3040 bytes --]

On Tue, 11 Dec 2007, Jon Smirl wrote:

> This makes sense. Those runs that blew up to 4.5GB were a combination
> of this effect and fragmentation in the gcc allocator.

I disagree.  This is insane.

> Google allocator appears to be much better at controlling fragmentation.

Indeed.  And if fragmentation is indeed wasting half of Git's memory 
usage then we'll have to come with a custom memory allocator.

> Is there a reasonable scheme to force the chains to only be loaded
> once and then shared between worker threads? The memory blow up
> appears to be directly correlated with chain length.

No.  That would be the equivalent of holding each revision of all files 
uncompressed all at once in memory.

> > That said, I suspect there are a few things fighting you:
> >
> >  - threading is hard. I haven't looked a lot at the changes Nico did to do
> >    a threaded object packer, but what I've seen does not convince me it is
> >    correct. The "trg_entry" accesses are *mostly* protected with
> >    "cache_lock", but nothing else really seems to be, so quite frankly, I
> >    wouldn't trust the threaded version very much. It's off by default, and
> >    for a good reason, I think.
> >
> >    For example: the packing code does this:
> >
> >         if (!src->data) {
> >                 read_lock();
> >                 src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
> >                 read_unlock();
> >                 ...
> >
> >    and that's racy. If two threads come in at roughly the same time and
> >    see a NULL src->data, theÿ́'ll both get the lock, and they'll both
> >    (serially) try to fill it in. It will all *work*, but one of them will
> >    have done unnecessary work, and one of them will have their result
> >    thrown away and leaked.
> 
> That may account for the threaded version needing an extra 20 minutes
> CPU time.  An extra 12% of CPU seems like too much overhead for
> threading. Just letting a couple of those long chain compressions be
> done twice

No it may not.  This theory is wrong as explained before.

> >
> >    Are you hitting issues like this? I dunno. The object sorting means
> >    that different threads normally shouldn't look at the same objects (not
> >    even the sources), so probably not, but basically, I wouldn't trust the
> >    threading 100%. It needs work, and it needs to stay off by default.
> >
> >  - you're working on a problem that isn't really even worth optimizing
> >    that much. The *normal* case is to re-use old deltas, which makes all
> >    of the issues you are fighting basically go away (because you only have
> >    a few _incremental_ objects that need deltaing).
> 
> I agree, this problem only occurs when people import giant
> repositories. But every time someone hits these problems they declare
> git to be screwed up and proceed to thrash it in their blogs.

It's not only for repack.  Someone just reported git-blame being 
unusable too due to insane memory usage, which I suspect is due to the 
same issue.


Nicolas

  reply	other threads:[~2007-12-11 18:57 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-07 23:05 Something is broken in repack Jon Smirl
2007-12-08  0:37 ` Linus Torvalds
2007-12-08  1:27   ` [PATCH] pack-objects: fix delta cache size accounting Nicolas Pitre
2007-12-08  1:46 ` Something is broken in repack Nicolas Pitre
2007-12-08  2:04   ` Jon Smirl
2007-12-08  2:28     ` Nicolas Pitre
2007-12-08  3:29       ` Jon Smirl
2007-12-08  3:37         ` David Brown
2007-12-08  4:22           ` Jon Smirl
2007-12-08  4:30             ` Nicolas Pitre
2007-12-08  5:01               ` Jon Smirl
2007-12-08  5:12                 ` Nicolas Pitre
2007-12-08  3:48         ` Harvey Harrison
2007-12-08  2:22   ` Jon Smirl
2007-12-08  3:44   ` Harvey Harrison
2007-12-08 22:18   ` Junio C Hamano
2007-12-09  8:05     ` Junio C Hamano
2007-12-09 15:19       ` Jon Smirl
2007-12-09 18:25       ` Jon Smirl
2007-12-10  1:07         ` Nicolas Pitre
2007-12-10  2:49     ` Nicolas Pitre
2007-12-08  2:56 ` David Brown
2007-12-10 19:56 ` Nicolas Pitre
2007-12-10 20:05   ` Jon Smirl
2007-12-10 20:16     ` Morten Welinder
2007-12-11  2:25 ` Jon Smirl
2007-12-11  2:55   ` Junio C Hamano
2007-12-11  3:27     ` Nicolas Pitre
2007-12-11 11:08       ` David Kastrup
2007-12-11 12:08         ` Pierre Habouzit
2007-12-11 12:18           ` David Kastrup
2007-12-11  3:49   ` Nicolas Pitre
2007-12-11  5:25     ` Jon Smirl
2007-12-11  5:29       ` Jon Smirl
2007-12-11  7:01         ` Jon Smirl
2007-12-11  7:34           ` Andreas Ericsson
2007-12-11 13:49           ` Nicolas Pitre
2007-12-11 15:00             ` Nicolas Pitre
2007-12-11 15:36               ` Jon Smirl
2007-12-11 16:20               ` Nicolas Pitre
2007-12-11 16:21                 ` Jon Smirl
2007-12-12  5:12                   ` Nicolas Pitre
2007-12-12  8:05                     ` David Kastrup
2007-12-14 16:18                       ` Wolfram Gloger
2007-12-12 15:48                     ` Nicolas Pitre
2007-12-12 16:17                       ` Paolo Bonzini
2007-12-12 16:37                       ` Linus Torvalds
2007-12-12 16:42                         ` David Miller
2007-12-12 16:54                           ` Linus Torvalds
2007-12-12 17:12                         ` Jon Smirl
2007-12-14 16:12                         ` Wolfram Gloger
2007-12-14 16:45                           ` David Kastrup
2007-12-14 16:59                             ` Wolfram Gloger
2007-12-13 13:32                       ` Nguyen Thai Ngoc Duy
2007-12-13 15:32                         ` Paolo Bonzini
2007-12-13 16:29                           ` Paolo Bonzini
2007-12-13 16:39                           ` Johannes Sixt
2007-12-14  1:04                             ` Jakub Narebski
2007-12-14  6:14                               ` Paolo Bonzini
2007-12-14  6:24                                 ` Nguyen Thai Ngoc Duy
2007-12-14  8:20                                   ` Paolo Bonzini
2007-12-14  9:01                                     ` Harvey Harrison
2007-12-14 10:40                                   ` Jakub Narebski
2007-12-14 10:52                                     ` Nguyen Thai Ngoc Duy
2007-12-14 13:25                                 ` Nicolas Pitre
2007-12-12 16:13                     ` Nicolas Pitre
2007-12-13  7:32                       ` Andreas Ericsson
2007-12-14 16:03                         ` Wolfram Gloger
2007-12-11 16:33           ` Linus Torvalds
2007-12-11 17:21             ` Nicolas Pitre
2007-12-11 17:24               ` David Miller
2007-12-11 17:44                 ` Nicolas Pitre
2007-12-11 20:26                   ` Andreas Ericsson
2007-12-11 18:43             ` Jon Smirl
2007-12-11 18:57               ` Nicolas Pitre [this message]
2007-12-11 19:17               ` Linus Torvalds
2007-12-11 19:40                 ` Junio C Hamano
2007-12-11 20:34                   ` Andreas Ericsson
2007-12-11 17:28           ` Daniel Berlin
2007-12-11 13:31         ` Nicolas Pitre
2007-12-11  6:01       ` Sean
2007-12-11  6:20         ` Jon Smirl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.0.99999.0712111349270.555@xanadu.home \
    --to=nico@cam.org \
    --cc=gcc@gcc.gnu.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonsmirl@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).