From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Uwe Kleine-König" <ukleinek@informatik.uni-freiburg.de>
Cc: Martin Waitz <tali@admingilde.org>,
Junio C Hamano <junkio@cox.net>,
Josef Weidendorfer <Josef.Weidendorfer@gmx.de>,
Eric Lesh <eclesh@ucla.edu>, Matthieu Moy <Matthieu.Moy@imag.fr>,
git@vger.kernel.org
Subject: Re: Submodule object store
Date: Tue, 27 Mar 2007 11:41:11 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0703271124590.6730@woody.linux-foundation.org> (raw)
In-Reply-To: <20070327172216.GA24200@informatik.uni-freiburg.de>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3016 bytes --]
On Tue, 27 Mar 2007, Uwe Kleine-König wrote:
>
> embeddedproject$ git ls-tree HEAD | grep linux
> 040000 commit 0123456789abcde0... linux-2.6
>
> (or how ever you save submodules). Then you might have to duplicate the
> objects of linux-2.6, because they are part of both histories.
No they are not. Unless you do it wrong.
The *only* object that is part of the superproject would be the tree that
*contains* that entry itself.
We should *never* automatically follow such an entry down, *exactly*
because that doesn't scale. So to actually follow that entry for something
like a recursive, you'd literally "cd into linux, and start 'git diff'
from commit 0123456.."
In other words, the subproject would be its own project, and the
superproject never sees it as "part of itself". I really think, for
example, that the "git diff" family of programs (diff-index, diff-tree,
diff-files) and things like "git ls-tree" should literally:
- have a mode where they don't even recurse into subprojects, and I
personally think that it could/should be the default!
- when they recurse, they should literally (at least to begin with) do
that kind of "fork() ; if (child) { chdir(subproject); execve(myself) }"
The latter is really to make sure that *even*by*mistake* we don't screw
things up and tie the sub/superproject together too tightly.
I'm serious. I really think that the first version (which ends up being
the one that sets semantics) should be very careful here, so that
subprojects never get mixed up with the superproject.
And I'm also serious about the "don't recurse into subproject by default
at all". If I'm at the superproject, and I maintain the superproject, I
think the state of the subprojects themselves are a totally separate
issue. It's quite a valid thing to do to maintain the build
infrastructure, and if I'm the maintainer of that, and I do "git diff", I
sure as hell don't want to wait for git to do "git diff" on the
subprojects when there are 5000 of them!
Sure, "git diff" is fast (on the kernel, it takes me 0.069s on a clean
tree), but
- multiply that 0.069s by 5000 and it's not so fast any more
- when you have a thousand subprojects, it's quite possible (even likely)
that all your directories won't fit in the cache any more, and suddenly
even a single "git diff" takes several seconds.
Really! Try this on the Linux tree (that "drop_caches" thing needs root
privileges):
echo 3 > /proc/sys/vm/drop_caches
git diff
and see it take something like 5 seconds. Now, imagine that you have a
hundred subprojects, and they're big enough that the caches are *never*
warm.
People sometimes don't seem to understand what "scalability" really means.
Scalability means that something that is so fast that you don't even
*think* about it will become a major bottleneck when you do it a thousand
times, and the working set has grown so big that it totally blows out
several levels of caches (both CPU caches and disk caches)
Linus
next prev parent reply other threads:[~2007-03-27 18:42 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-25 12:30 .gitlink for Summer of Code Eric Lesh
2007-03-25 15:20 ` Matthieu Moy
2007-03-25 20:39 ` Shawn O. Pearce
2007-03-25 20:54 ` Johannes Schindelin
2007-03-25 21:03 ` Shawn O. Pearce
2007-03-25 20:55 ` Junio C Hamano
2007-03-25 21:05 ` Shawn O. Pearce
2007-03-27 3:40 ` Petr Baudis
2007-03-26 17:16 ` Eric Lesh
2007-03-26 17:22 ` Matthieu Moy
2007-03-26 17:38 ` Eric Lesh
2007-03-26 18:35 ` Martin Waitz
2007-03-26 19:33 ` Josef Weidendorfer
2007-03-26 19:49 ` Matthieu Moy
2007-03-26 23:14 ` Josef Weidendorfer
2007-03-27 16:59 ` Matthieu Moy
2007-03-26 22:03 ` Martin Waitz
2007-03-26 22:51 ` Junio C Hamano
2007-03-26 23:16 ` Submodule object store Martin Waitz
2007-03-26 23:28 ` Junio C Hamano
2007-03-26 23:36 ` Martin Waitz
2007-03-26 23:20 ` David Lang
2007-03-26 23:55 ` Martin Waitz
2007-03-26 23:40 ` David Lang
2007-03-27 15:25 ` Martin Waitz
2007-03-27 16:53 ` David Lang
2007-03-27 0:29 ` Junio C Hamano
2007-03-27 14:28 ` Martin Waitz
2007-03-27 11:25 ` Uwe Kleine-König
2007-03-27 11:50 ` Uwe Kleine-König
2007-03-27 15:53 ` Martin Waitz
2007-03-27 16:56 ` Josef Weidendorfer
2007-03-27 16:44 ` Martin Waitz
2007-03-27 17:22 ` Uwe Kleine-König
2007-03-27 18:41 ` Linus Torvalds [this message]
2007-03-27 19:42 ` Uwe Kleine-König
2007-03-27 19:53 ` Linus Torvalds
2007-03-27 19:59 ` Linus Torvalds
2007-03-27 15:46 ` Martin Waitz
2007-03-26 23:17 ` .gitlink for Summer of Code Josef Weidendorfer
[not found] ` <Pine.LNX.4.64.0703270952020. 6730@woody.linux-foundation.org>
2007-03-26 23:24 ` Junio C Hamano
2007-03-27 17:04 ` Linus Torvalds
2007-03-27 17:00 ` David Lang
2007-03-27 18:15 ` Linus Torvalds
2007-03-27 17:35 ` Martin Waitz
2007-03-27 18:09 ` Daniel Barkalow
2007-03-27 18:19 ` Linus Torvalds
2007-03-27 20:54 ` Daniel Barkalow
2007-03-27 21:11 ` Linus Torvalds
2007-03-27 20:54 ` David Lang
2007-03-27 23:31 ` Jakub Narebski
2007-03-27 23:20 ` David Lang
2007-03-27 18:36 ` Steven Grimm
2007-03-27 20:02 ` Daniel Barkalow
2007-03-27 21:27 ` Linus Torvalds
2007-03-26 23:00 ` Josef Weidendorfer
2007-03-26 23:27 ` Martin Waitz
2007-03-26 17:31 ` Jakub Narebski
2007-03-26 18:21 ` Matthieu Moy
2007-03-27 0:48 ` Jakub Narebski
2007-03-25 20:46 ` Shawn O. Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0703271124590.6730@woody.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=Josef.Weidendorfer@gmx.de \
--cc=Matthieu.Moy@imag.fr \
--cc=eclesh@ucla.edu \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=tali@admingilde.org \
--cc=ukleinek@informatik.uni-freiburg.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).