From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: If I were redoing git from scratch...
Date: Sat, 4 Nov 2006 08:44:02 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0611040829040.25218@g5.osdl.org> (raw)
In-Reply-To: <7vpsc3xx65.fsf@assigned-by-dhcp.cox.net>
On Sat, 4 Nov 2006, Junio C Hamano wrote:
>
> The biggest one is that we use too many static (worse, function
> scope static) variables that live for the life of the process,
> which makes many things very nice and easy ("run-once and let
> exit clean up the mess" mentality), but because of this it
> becomes awkward to do certain things. Examples are:
>
> - Multiple invocations of merge-bases (needs clearing the
> marks left on commit objects by earlier traversal),
Well, quite frankly, I dare anybody to do it differently, yet have good
performance with millions of objects.
The fact is, I don't think it _can_ be done. I would seriously suggest
re-visiting this in five years, just because CPU's and memory will by then
hopefully have gotten an order of magnitude faster/bigger.
The thing is, the object database when we read it in really needs to be
pretty compact-sized, and we need to remember objects we've seen earlier
(exactly _because_ we depend on the flags). So there's exactly two
alternatives:
- global life-time allocations of objects like we do now
- magic memory management with unknown lifetimes and keeping track of all
pointers.
And I'd like to point out that the memory management right now is simply
not realistic:
- it's too damn hard. A simple garbage collector based on the approach we
have now would simply not be able to do anything, since all objects are
_by_definition_ reachable from the hash chains, so there's nothing to
collect. The lifetime of an object fundamentally _is_ the whole process
lifetime, exactly because we expect the objects (and the object flags
in particular) to be meaningful.
- pretty much all garbage collection schemes tend to have a memory
footprint that is about twice what a static footprint is under any
normal load. Think about what we already do with "git pack-objects" for
something like the mozilla repository: I worked quite a lot on getting
the memory footprint down, and it's _still_ several hundred MB.
In other words, I can pretty much guarantee that some kind of "smarter"
memory management would be a huge step backwards. Yes, we now have to do
some things explicitly, but exactly because we do them explicitly we can
_afford_ to have the stupid and simple and VERY EFFICIENT memory
management ("lack of memory management") that we have now.
The memory use of git had an very real correlation with performance when I
was doing the memory shrinking a few months back (back in June). I realize
that it's perhaps awkward, but I would really want people to realize that
it's a huge performance issue. It was a clear performance issue for me
(and I use machines with 2GB of RAM, so I was never swapping), it would be
an even bigger one for anybody where the size meant that you needed to
start doing paging.
So I would seriously ask you not to even consider changing the object
model. Maybe add a few more helper routines to clear all object flags or
something, but the "every object is global and will never be de-allocated"
is really a major deal.
Five years from now, or for somebody who re-implements git in Java (where
performance isn't going to be the major issue anyway, and you probably do
"small" things like "commit" and "diff", and never do full-database things
like "git repack"), _then_ you can happily look at having something
fancier. Right now, it's too easy to just look at cumbersome interfaces,
and forget about the fact that those interfaces is sometimes what allows
us to practically do some things in the first place.
next prev parent reply other threads:[~2006-11-04 16:44 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-04 11:34 If I were redoing git from scratch Junio C Hamano
2006-11-04 12:21 ` Jakub Narebski
2006-11-04 16:44 ` Linus Torvalds [this message]
2006-11-04 19:16 ` Shawn Pearce
2006-11-04 22:29 ` Robin Rosenberg
2006-11-04 22:44 ` Linus Torvalds
2006-11-04 23:15 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0611040829040.25218@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).