From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Levin Du <zslevin@gmail.com>, git@vger.kernel.org
Subject: Re: Questions about git-push for huge repositories
Date: Tue, 8 Sep 2015 17:54:57 -0400 [thread overview]
Message-ID: <20150908215457.GC24159@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqq4mj493cp.fsf@gitster.mtv.corp.google.com>
On Tue, Sep 08, 2015 at 11:24:06AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > If you turn on reachability bitmaps, git _will_ do the thorough set
> > difference, because it becomes much cheaper to do so. E.g., try:
> >
> > git repack -adb
> >
> > in repo A to build a single pack with bitmaps enabled. Then a subsequent
> > push should send only a single object (the new commit).
>
> Hmph, A has the tip of B, and has a new commit B hasn't seen but A
> knows that new commit's tree matches the tree of the tip of B.
>
> Wouldn't --thin transfer from A to B know to send only that new
> commit object without sending anything below the tree in such a
> case, even without the bitmap?
I started to write about that in my analysis, but it gets confusing
quickly. There are actually many tip trees, because A and B also share
all of their tags. We do not mark every blob of every tip tree as a
preferred base, because it is expensive to do so (and it just clogs our
object array). Plus this only helps in the narrow circumstance that we
have the exact same tree as the tip (and not, say, the same tree as
master^, which I think it would be unreasonable to expect git to find).
But if we do:
(cd ../B && git tag | git tag -d)
to delete all of the other tips besides master, leaving only the one
that we know has the same tree, I'd expect git to figure it out.
Certainly I would not expect it to save all of the delta compression,
in the sense that we may throw away on-disk delta bases to older objects
(because we don't realize the other side has those older objects). But I
would have thought before we even hit that phase, adding those objects
as "preferred bases" would have marked them as "do not send" in the
first place.
There is code in have_duplicate_entry() to handle this. I wonder why it
doesn't kick in.
-Peff
next prev parent reply other threads:[~2015-09-08 21:55 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-06 8:16 Questions about git-push for huge repositories Levin Du
2015-09-06 17:48 ` Junio C Hamano
2015-09-07 1:05 ` Levin Du
2015-09-07 3:51 ` Levin Du
2015-09-08 1:30 ` Levin Du
2015-09-08 5:44 ` Jeff King
2015-09-08 18:24 ` Junio C Hamano
2015-09-08 21:54 ` Jeff King [this message]
2015-09-08 5:00 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150908215457.GC24159@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=zslevin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).