Kernel Hacker's guide to git (updated)

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Kernel Hacker's guide to git (updated)
@ 2005-07-15 17:36 Jeff Garzik
  2005-07-26  4:53 ` Why pack+unpack? Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Garzik @ 2005-07-15 17:36 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Git Mailing List, Dave Jones

I've updated my git quickstart guide at

	http://linux.yyz.us/git-howto.html

It now points to DaveJ's daily snapshots for the initial bootstrap 
tarball, is reorganized for better navigation, and other things.

Also, a bonus recipe:  how to import Linus's pack files (it's easy).

This recipe presumes that you have a vanilla-Linus repo 
(/repo/linux-2.6) and your own repo (/repo/myrepo-2.6).

$ cd /repo/myrepo-2.6
$ git-fsck-cache		# fsck, make sure we're OK
$ git pull /repo/linux-2.6/.git	# make sure we're up-to-date
$ cp -al ../linux-2.6/.git/objects/pack .git/objects
$ cp ../linux-2.6/.git/refs/tags/* .git/refs/tags
$ git-prune-packed
$ git-fsck-cache		# fsck #2, make sure we're OK

This recipe reduced my kernel.org sync from ~50,000 files to ~5,000 files.

	Jeff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Why pack+unpack?
  2005-07-15 17:36 Kernel Hacker's guide to git (updated) Jeff Garzik
@ 2005-07-26  4:53 ` Linus Torvalds
  2005-07-26  5:13   ` Jeff Garzik
  2005-07-26  6:14   ` Junio C Hamano
  0 siblings, 2 replies; 5+ messages in thread
From: Linus Torvalds @ 2005-07-26  4:53 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Git Mailing List

On Tue, 26 Jul, Jeff Garzik wrote:
>
>				AFAICT this is
> just a complete waste of time.  Why does this occur?
>
> Packing 1394 objects
> Unpacking 1394 objects
>   100% (1394/1394) done
> 
> It doesn't seem to make any sense to perform work, then immediately undo
> that work, just for a local pull.

First, make sure you have a recent git, it does better at optimizing the 
objects, so there are fewer of them. Of course, the above could be a real 
pull of a a fair amount of work, but check that your git has this commit:

	commit 4311d328fee11fbd80862e3c5de06a26a0e80046

	    Be more aggressive about marking trees uninteresting

because otherwise you sometimes get a fair number of objects just because
git-rev-list wasn't always being very careful, and took more objects than
it strictly needed.

Secondly, what's the problem? Sure, I could special-case the local case, 
but do you really want to have two _totally_ different code-paths? In 
other words, it's absolutely NOT a complete waste of time: it's very much 
a case of trying to have a unified architecture, and the fact that it 
spends a few seconds doing things in a way that is network-transparent is 
time well spent.

Put another way: do you argue that X network transparency is a total waste
of time? You could certainly optimize X if you always made it be
local-machine only. Or you could make tons of special cases, and have X 
have separate code-paths for local clients and for remote clients, rather 
than just always opening a socket connection.

See? Trying to have one really solid code-path is not a waste of time. 

We do end up having a special code-path for "clone" (the "-l" flag), which
does need it, but I seriously doubt you need it for a local pull. The most 
expensive operation in a local pull tends to be (if the repositories are 
unpacked and cold-cache) just figuring out the objects to pull, not the 
packing/unpacking per se.

			Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Why pack+unpack?
  2005-07-26  4:53 ` Why pack+unpack? Linus Torvalds
@ 2005-07-26  5:13   ` Jeff Garzik
  2005-07-26 16:44     ` Linus Torvalds
  2005-07-26  6:14   ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Jeff Garzik @ 2005-07-26  5:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds wrote:
> First, make sure you have a recent git, it does better at optimizing the 

I was using vanilla git, as of 10 minutes before I sent the email.  Top 
of tree is 154d3d2dd2656c23ea04e9d1c6dd4e576a7af6de.


> Secondly, what's the problem? Sure, I could special-case the local case, 
> but do you really want to have two _totally_ different code-paths? In 
> other words, it's absolutely NOT a complete waste of time: it's very much 
> a case of trying to have a unified architecture, and the fact that it 
> spends a few seconds doing things in a way that is network-transparent is 
> time well spent.
> 
> Put another way: do you argue that X network transparency is a total waste
> of time? You could certainly optimize X if you always made it be
> local-machine only. Or you could make tons of special cases, and have X 
> have separate code-paths for local clients and for remote clients, rather 
> than just always opening a socket connection.

Poor example...   sure it opens a socket, but X certainly does have a 
special case local path (mit shm), and they're adding more for 3D due 
the massive amount of data involved in 3D.


> We do end up having a special code-path for "clone" (the "-l" flag), which
> does need it, but I seriously doubt you need it for a local pull. The most 
> expensive operation in a local pull tends to be (if the repositories are 
> unpacked and cold-cache) just figuring out the objects to pull, not the 
> packing/unpacking per se.

Well, I'm not overly concerned, mostly curious.  The pack+unpack step 
(a) appears completely redundant and (b) is the step that takes the most 
time here, for local pulls, after the diffstat.

	Jeff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Why pack+unpack?
  2005-07-26  5:13   ` Jeff Garzik
@ 2005-07-26 16:44     ` Linus Torvalds
  0 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2005-07-26 16:44 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Git Mailing List

On Tue, 26 Jul 2005, Jeff Garzik wrote:
> > 
> > Put another way: do you argue that X network transparency is a total waste
> > of time? You could certainly optimize X if you always made it be
> > local-machine only. Or you could make tons of special cases, and have X 
> > have separate code-paths for local clients and for remote clients, rather 
> > than just always opening a socket connection.
> 
> Poor example...   sure it opens a socket, but X certainly does have a 
> special case local path (mit shm), and they're adding more for 3D due 
> the massive amount of data involved in 3D.

.. and that's still a special case. Exactly like git does the "clone -l" 
special case.

> Well, I'm not overly concerned, mostly curious.  The pack+unpack step 
> (a) appears completely redundant and (b) is the step that takes the most 
> time here, for local pulls, after the diffstat.

It's not actually redundant. Some of the _compression_ may be, and you 
could see if you prefer a smaller delta window (use "--window=0" to 
git-pack-objects to totally disable delta compression), but in general you 
can't actually just link the files over like with "git clone", because 
that would create total chaos and a real mess if the other end was packed.

So "git pull" actually needs to copy one object at a time in order to have 
sensible semantics together with "git repack". Now, you could make that 
"one object at a time" thing have its own special cases ("if it's packed, 
extract it as a unpacked object in the destination, if it's unpacked, just 
link it if you can"), but it would just be pretty ugly.

If it ever gets to be a real performance problem, we can certainly fix it,
but in the meantime I _much_ prefer having one single path. I dislike the
rsync (and the http) paths immensely already, but at least I don't have to
use them..

			Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Why pack+unpack?
  2005-07-26  4:53 ` Why pack+unpack? Linus Torvalds
  2005-07-26  5:13   ` Jeff Garzik
@ 2005-07-26  6:14   ` Junio C Hamano
  1 sibling, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2005-07-26  6:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff Garzik, Git Mailing List

Linus Torvalds <torvalds@osdl.org> writes:

> See? Trying to have one really solid code-path is not a waste of time. 

An alternative code path specialized for local case would not be
too bad.

First, finding the list of objects to copy.  You can use
alternate object pool to cover the upstream repository to pull
from, and the downstream repository to pull into (both local),
run rev-list --objects, giving it prefix '^' for all refs in the
downstream repository, and the upstream head SHA1 you are
pulling.  If the upstream head you are pulling is a tag, then
you may need to dereference it as well.

Among those objects, ones unpacked in the upstream can be
copied/linked to the downstream repository.

Handling packs involves a little bit of policy decision.  The
current pack/unpack way always unpacks, and to emulate it, we
can cat-file in the upstream object database, pipe that to
"hash-object -w" (after giving hash-object an option to read
from the standard input) to write in the downstream repository
unpacked.  Easier alternative is to just hardlink all the packs
from the upstream object database into the downstream object
database, and keep packed things packed.

Well, it starts to sound somewhat bad...

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-07-26 16:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-15 17:36 Kernel Hacker's guide to git (updated) Jeff Garzik
2005-07-26  4:53 ` Why pack+unpack? Linus Torvalds
2005-07-26  5:13   ` Jeff Garzik
2005-07-26 16:44     ` Linus Torvalds
2005-07-26  6:14   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).