From: Jeff King <peff@peff.net>
To: Josef Wolf <jw@raven.inka.de>, git@vger.kernel.org
Subject: Re: Re-Transmission of blobs?
Date: Thu, 12 Sep 2013 15:44:53 -0400 [thread overview]
Message-ID: <20130912194453.GD32069@sigill.intra.peff.net> (raw)
In-Reply-To: <20130912103531.GD14259@raven.wolf.lan>
On Thu, Sep 12, 2013 at 12:35:32PM +0200, Josef Wolf wrote:
> I'm not sure I understand correctly. I see that bitmaps can be used to
> implement set operations. But how comes that walking the graph requires a lot
> of CPU? Isn't it O(n)?
Yes and no. Your "n" there is the entirety of history. Whereas a simple
"git push" generally only has to look at the recent history. So even
though you are looking at each commit and tree only once, it's still a
large number of them (and each one needs to be pulled off of the disk,
decompressed, and reconstructed from deltas).
Secondly, the graph traversal ends up seeing the same sha1s over and
over again in tree entries (because most entries in the tree don't
change from commit to commit). We spend a non-trivial amount of time
looking those up in a hash table.
Just try "git rev-list --objects --all" in your favorite repository to
get a sense. It takes something like 30 seconds in the kernel repo. You
would probably not want to add 30 seconds of CPU time to a trivial push.
> Those bitmaps would be stored in the git metadata? Is it worth it? Storing a
> bitmap for every commit just to be used once-in-a-while seems to be a pretty
> big overhead to me. Not to mention the interoperability problems you mentioned
> below.
There are tricks to make them smaller (run-length compression,
bitmapping a subset of commits and traversing to the nearest one,
storing bitmaps as deltas against nearby bitmaps). And how often it is
used depends on your git workload. For a repository serving git clones
and fetches, it speeds up every operation.
Try starting a clone of:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
versus
git://github.com/torvalds/linux.git
and see which one starts sending you data more quickly.
> Sounds like you're already almost done and don't really need help
> anymore. Just out of curiosity, I'd be interested in a pointer anyway
> ;-)
Shawn gave a talk on JGit here:
http://www.eclipsecon.org/2013/sites/eclipsecon.org.2013/files/Scaling%20Up%20JGit%20-%20EclipseCon%202013.pdf
and the scrapped patches for git are here:
http://article.gmane.org/gmane.comp.version-control.git/228918
-Peff
next prev parent reply other threads:[~2013-09-12 19:45 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-10 13:08 Re-Transmission of blobs? Josef Wolf
2013-09-10 17:51 ` Junio C Hamano
2013-09-11 11:27 ` Josef Wolf
2013-09-11 17:14 ` Junio C Hamano
2013-09-12 7:42 ` Josef Wolf
2013-09-12 9:23 ` Jeff King
2013-09-12 10:35 ` Josef Wolf
2013-09-12 19:44 ` Jeff King [this message]
2013-09-13 10:09 ` Josef Wolf
2013-09-16 21:55 ` Jeff King
2013-09-20 9:27 ` Josef Wolf
2013-09-24 7:36 ` Jeff King
2013-09-24 20:36 ` Josef Wolf
2013-09-12 12:45 ` Pyeron, Jason J CTR (US)
2013-09-12 19:56 ` Jeff King
2013-09-12 20:06 ` Pyeron, Jason J CTR (US)
2013-09-13 10:23 ` Josef Wolf
2013-09-13 11:51 ` Jason Pyeron
2013-09-13 12:16 ` Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130912194453.GD32069@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=jw@raven.inka.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).