From: Nicolas Pitre <nico@cam.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Junio C Hamano <gitster@pobox.com>,
gcc@gcc.gnu.org, Git Mailing List <git@vger.kernel.org>
Subject: Re: Something is broken in repack
Date: Tue, 11 Dec 2007 13:57:22 -0500 (EST) [thread overview]
Message-ID: <alpine.LFD.0.99999.0712111349270.555@xanadu.home> (raw)
In-Reply-To: <9e4733910712111043h6a361996x740f4dba3d742da5@mail.gmail.com>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3040 bytes --]
On Tue, 11 Dec 2007, Jon Smirl wrote:
> This makes sense. Those runs that blew up to 4.5GB were a combination
> of this effect and fragmentation in the gcc allocator.
I disagree. This is insane.
> Google allocator appears to be much better at controlling fragmentation.
Indeed. And if fragmentation is indeed wasting half of Git's memory
usage then we'll have to come with a custom memory allocator.
> Is there a reasonable scheme to force the chains to only be loaded
> once and then shared between worker threads? The memory blow up
> appears to be directly correlated with chain length.
No. That would be the equivalent of holding each revision of all files
uncompressed all at once in memory.
> > That said, I suspect there are a few things fighting you:
> >
> > - threading is hard. I haven't looked a lot at the changes Nico did to do
> > a threaded object packer, but what I've seen does not convince me it is
> > correct. The "trg_entry" accesses are *mostly* protected with
> > "cache_lock", but nothing else really seems to be, so quite frankly, I
> > wouldn't trust the threaded version very much. It's off by default, and
> > for a good reason, I think.
> >
> > For example: the packing code does this:
> >
> > if (!src->data) {
> > read_lock();
> > src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
> > read_unlock();
> > ...
> >
> > and that's racy. If two threads come in at roughly the same time and
> > see a NULL src->data, theÿ́'ll both get the lock, and they'll both
> > (serially) try to fill it in. It will all *work*, but one of them will
> > have done unnecessary work, and one of them will have their result
> > thrown away and leaked.
>
> That may account for the threaded version needing an extra 20 minutes
> CPU time. An extra 12% of CPU seems like too much overhead for
> threading. Just letting a couple of those long chain compressions be
> done twice
No it may not. This theory is wrong as explained before.
> >
> > Are you hitting issues like this? I dunno. The object sorting means
> > that different threads normally shouldn't look at the same objects (not
> > even the sources), so probably not, but basically, I wouldn't trust the
> > threading 100%. It needs work, and it needs to stay off by default.
> >
> > - you're working on a problem that isn't really even worth optimizing
> > that much. The *normal* case is to re-use old deltas, which makes all
> > of the issues you are fighting basically go away (because you only have
> > a few _incremental_ objects that need deltaing).
>
> I agree, this problem only occurs when people import giant
> repositories. But every time someone hits these problems they declare
> git to be screwed up and proceed to thrash it in their blogs.
It's not only for repack. Someone just reported git-blame being
unusable too due to insane memory usage, which I suspect is due to the
same issue.
Nicolas
next prev parent reply other threads:[~2007-12-11 18:57 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-07 23:05 Something is broken in repack Jon Smirl
2007-12-08 0:37 ` Linus Torvalds
2007-12-08 1:27 ` [PATCH] pack-objects: fix delta cache size accounting Nicolas Pitre
2007-12-08 1:46 ` Something is broken in repack Nicolas Pitre
2007-12-08 2:04 ` Jon Smirl
2007-12-08 2:28 ` Nicolas Pitre
2007-12-08 3:29 ` Jon Smirl
2007-12-08 3:37 ` David Brown
2007-12-08 4:22 ` Jon Smirl
2007-12-08 4:30 ` Nicolas Pitre
2007-12-08 5:01 ` Jon Smirl
2007-12-08 5:12 ` Nicolas Pitre
2007-12-08 3:48 ` Harvey Harrison
2007-12-08 2:22 ` Jon Smirl
2007-12-08 3:44 ` Harvey Harrison
2007-12-08 22:18 ` Junio C Hamano
2007-12-09 8:05 ` Junio C Hamano
2007-12-09 15:19 ` Jon Smirl
2007-12-09 18:25 ` Jon Smirl
2007-12-10 1:07 ` Nicolas Pitre
2007-12-10 2:49 ` Nicolas Pitre
2007-12-08 2:56 ` David Brown
2007-12-10 19:56 ` Nicolas Pitre
2007-12-10 20:05 ` Jon Smirl
2007-12-10 20:16 ` Morten Welinder
2007-12-11 2:25 ` Jon Smirl
2007-12-11 2:55 ` Junio C Hamano
2007-12-11 3:27 ` Nicolas Pitre
2007-12-11 11:08 ` David Kastrup
2007-12-11 12:08 ` Pierre Habouzit
2007-12-11 12:18 ` David Kastrup
2007-12-11 3:49 ` Nicolas Pitre
2007-12-11 5:25 ` Jon Smirl
2007-12-11 5:29 ` Jon Smirl
2007-12-11 7:01 ` Jon Smirl
2007-12-11 7:34 ` Andreas Ericsson
2007-12-11 13:49 ` Nicolas Pitre
2007-12-11 15:00 ` Nicolas Pitre
2007-12-11 15:36 ` Jon Smirl
2007-12-11 16:20 ` Nicolas Pitre
2007-12-11 16:21 ` Jon Smirl
2007-12-12 5:12 ` Nicolas Pitre
2007-12-12 8:05 ` David Kastrup
2007-12-14 16:18 ` Wolfram Gloger
2007-12-12 15:48 ` Nicolas Pitre
2007-12-12 16:17 ` Paolo Bonzini
2007-12-12 16:37 ` Linus Torvalds
2007-12-12 16:42 ` David Miller
2007-12-12 16:54 ` Linus Torvalds
2007-12-12 17:12 ` Jon Smirl
2007-12-14 16:12 ` Wolfram Gloger
2007-12-14 16:45 ` David Kastrup
2007-12-14 16:59 ` Wolfram Gloger
2007-12-13 13:32 ` Nguyen Thai Ngoc Duy
2007-12-13 15:32 ` Paolo Bonzini
2007-12-13 16:29 ` Paolo Bonzini
2007-12-13 16:39 ` Johannes Sixt
2007-12-14 1:04 ` Jakub Narebski
2007-12-14 6:14 ` Paolo Bonzini
2007-12-14 6:24 ` Nguyen Thai Ngoc Duy
2007-12-14 8:20 ` Paolo Bonzini
2007-12-14 9:01 ` Harvey Harrison
2007-12-14 10:40 ` Jakub Narebski
2007-12-14 10:52 ` Nguyen Thai Ngoc Duy
2007-12-14 13:25 ` Nicolas Pitre
2007-12-12 16:13 ` Nicolas Pitre
2007-12-13 7:32 ` Andreas Ericsson
2007-12-14 16:03 ` Wolfram Gloger
2007-12-11 16:33 ` Linus Torvalds
2007-12-11 17:21 ` Nicolas Pitre
2007-12-11 17:24 ` David Miller
2007-12-11 17:44 ` Nicolas Pitre
2007-12-11 20:26 ` Andreas Ericsson
2007-12-11 18:43 ` Jon Smirl
2007-12-11 18:57 ` Nicolas Pitre [this message]
2007-12-11 19:17 ` Linus Torvalds
2007-12-11 19:40 ` Junio C Hamano
2007-12-11 20:34 ` Andreas Ericsson
2007-12-11 17:28 ` Daniel Berlin
2007-12-11 13:31 ` Nicolas Pitre
2007-12-11 6:01 ` Sean
2007-12-11 6:20 ` Jon Smirl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.0.99999.0712111349270.555@xanadu.home \
--to=nico@cam.org \
--cc=gcc@gcc.gnu.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonsmirl@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).