From: Linus Torvalds <torvalds@linux-foundation.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Jeff King <peff@peff.net>,
jnareb@gmail.com, Nicolas Pitre <nico@cam.org>,
"Shawn O. Pearce" <spearce@spearce.org>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: git-daemon on NSLU2
Date: Sun, 26 Aug 2007 10:15:24 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.0.999.0708260959050.25853@woody.linux-foundation.org> (raw)
In-Reply-To: <9e4733910708260934i1381e73ftb31c7de0d23f6cae@mail.gmail.com>
On Sun, 26 Aug 2007, Jon Smirl wrote:
>
> Changing git-daemon only for the initial clone case also means that
> people don't need to change the way they manage packs.
I do agree that we might want to do some special-case handling for the
initial clone (because it *is* kind of special), but it's not necessarily
as easy as just re-using an existing pack.
At a minimum, we'd need to have something that knows how to make a single
pack out of several packs and some loose objects. That shouldn't be
*hard*, but it's certainly nontrivial, especially in the presense of the
same objects possibly being available more than once in different packs.
[ The "duplicate object" thing does actually happen: even if you use only
"git native" protocols, you can get duplicate objects because a file was
changed back to an earlier version. The incremental packs you get from
push/pull'ing between two repositories try to send the minimal
incremental changes, but the keyword here is _try_: they will
potentially send objects that the receiver already has, if it's not
obvious that the receiver has them from the "commit boundary" cases ]
Maybe the client side will handle a pack with duplicate objects perfectly
fine, and it's not an issue. Maybe. It might even be likely (I can't think
of anything that would obviously break). But at a minimum, it would be
something that needs some code on the sending side, and a lot of
verification that the end result works ok on the receiving side.
And there's actually a deeper problem: the current native protocol
guarantees that the objects sent over are only those that are reachable.
That matters. It matters for subtle security issues (maybe you are
exporting some repository that was rebased, and has objects that you
didn't *intend* to make public!), but it also matters for issues like git
"alternates" files.
If you only ever look at a single repo, you'll never see the alternates
issue, but if you're seriously looking at serving git repositories, I
don't really see the "single repo" case as being at all the most common or
interesting case.
And if you look at something like kernel.org, the "alternates" thing is
*much* more important than how much memory git-daemon uses! Yes,
kernel.org would probably be much happier if git-daemon wasn't such a
memory pig occasionally, but on the other hand, the win from using
alternates and being able to share 99% of all objects in all the various
related kernel repositories is actually likely to be a *bigger* memory win
than any git-daemon memory usage, because now the disk caching works a
hell of a lot better!
So it's not actually clear how the initial clone thing can be optimized on
the server side.
It's easier to optimize on the *client* side: just do the initial clone
with rsync/http (and "git gc" it on the client afterwards), and then
change it to the git native protocol after the clone.
That may not sound very user-friendly, but let's face it, I think there is
exactly one person in the whole universe that tries to use an NSLU2 as a
git server. So the "client-side workaround" is likely to affect a very
limited number of clients ;)
Linus
next prev parent reply other threads:[~2007-08-26 17:16 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-24 5:54 git-daemon on NSLU2 Jon Smirl
2007-08-24 6:21 ` Shawn O. Pearce
2007-08-24 19:38 ` Jon Smirl
2007-08-24 20:23 ` Nicolas Pitre
2007-08-24 21:17 ` Jon Smirl
2007-08-24 21:54 ` Nicolas Pitre
2007-08-24 22:06 ` Jon Smirl
2007-08-24 22:39 ` Jakub Narebski
2007-08-24 22:59 ` Junio C Hamano
2007-08-24 23:21 ` Jakub Narebski
2007-08-24 23:46 ` Jon Smirl
2007-08-25 0:04 ` Junio C Hamano
2007-08-25 7:12 ` David Kastrup
2007-08-25 17:02 ` Salikh Zakirov
2007-08-25 0:10 ` Nicolas Pitre
2007-08-24 23:28 ` Linus Torvalds
2007-08-25 15:44 ` Jon Smirl
2007-08-26 9:33 ` Jeff King
2007-08-26 16:34 ` Jon Smirl
2007-08-26 17:15 ` Linus Torvalds [this message]
2007-08-26 18:06 ` Jon Smirl
2007-08-26 18:26 ` Linus Torvalds
2007-08-26 19:00 ` Jon Smirl
2007-08-26 20:19 ` Linus Torvalds
2007-08-26 21:22 ` Junio C Hamano
2007-08-27 11:03 ` Theodore Tso
2007-08-27 16:26 ` Linus Torvalds
2007-08-26 22:24 ` Daniel Hulme
2007-08-27 0:14 ` Jakub Narebski
2007-08-24 20:27 ` Jon Smirl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.0.999.0708260959050.25853@woody.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=jonsmirl@gmail.com \
--cc=nico@cam.org \
--cc=peff@peff.net \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).