git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Levin Du <zslevin@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: Questions about git-push for huge repositories
Date: Tue, 8 Sep 2015 01:44:33 -0400	[thread overview]
Message-ID: <20150908054432.GC26331@sigill.intra.peff.net> (raw)
In-Reply-To: <CAN6cQGOO540FV9bTQPks+1nHS1xO10Rv8iNzAj8-cBihQ4_kEw@mail.gmail.com>

On Tue, Sep 08, 2015 at 09:30:09AM +0800, Levin Du wrote:

> Take kernel source code for example:
> 
> # Clone the kernel to A and B
> $ git --version
> git version 2.3.2
> $ git clone --bare  ../kernel/ A
> $ git clone --bare  ../kernel/ B

OK, two repos with the same source.

> # Create the orphan commit and check
> $ cd A
> $ git branch test
> Switched to a new branch 'test'
> $ git replace --graft test
> $ git rev-parse test
> cbbae6741c60c9e09f87521e3a79810abd6a2fda
> $ git rev-parse test^{tree}
> 929bdce0b48ca6079ad281a9d8ba24de3e49881a
> $ git rev-parse replace/cbbae6741c60c9e09f87521e3a79810abd6a2fda
> 82d3e9ce1ca062c219f1209c5291ccd5603e5302
> $ git rev-parse 82d3e9ce1ca062c219f1209c5291ccd5603e5302^{tree}
> 929bdce0b48ca6079ad281a9d8ba24de3e49881a
> $ git log --pretty=oneline 82d3e9ce1ca062c219f1209c5291ccd5603e5302 | wc -l
> 1

So you've created a new commit object, 82d3e9ce1, which has the same
tree as the original branch, but no parents.

Note that fetch and push do not respect the "replace" mechanism. They
can't, because we have no idea if the other side of the connection
shares our "replace" view of the world. So if I use "replace" to say
that commit X has parent Y, I cannot assume that pushing to some _other_
repository with X means that they also have all of Y.

But it should be OK, of course, to push the new orphan commit. I.e., if
we are pushing the object itself, not caring that it is part of a
"replace" mechanism, that should be no different than pushing any other
commit.

> $ du -hs ../B
> 1.6G ../B
> $ git push ../B 'refs/replace/*'
> Counting objects: 51216, done.
> Delta compression using up to 8 threads.
> Compressing objects: 100% (48963/48963), done.
> Writing objects: 100% (51216/51216), 139.61 MiB | 17.88 MiB/s, done.
> Total 51216 (delta 3647), reused 34580 (delta 1641)
> To ../B
> * [new branch]
> refs/replace/cbbae6741c60c9e09f87521e3a79810abd6a2fda ->
> refs/replace/cbbae6741c60c9e09f87521e3a79810abd6a2fda
> $ du -hs ../B
> 1.7G ../B
> 
> It takes some time for 'git push' to compress the objects and B has
> finally increased 0.1G,
> which is for the newly commit whose tree is already in the repository.

Right, this is due to the commit-walking that Junio explained earlier.
We walk the commits only, and then expand the positive side (things the
other side wants) into trees and blobs. Even though we know about a
commit that the other side has that points to the tree, we don't make
the connection.

You can get a more thorough answer by expanding and marking all trees
and blobs, taking the set difference between all of the objects you want
to send, and all of the objects you know the other side has. I.e.,
basically:

  # what we want to send
  git rev-list --objects 82d3e9ce1ca062c219f1209c5291ccd5603e5302 | sort >want

  # what we know the other side has; turn off replacements, since we
  # want the real value, not with our fake replace overlaid
  git --no-replace-objects rev-list --objects refs/heads/master | sort >have

  # set difference
  comm -23 want have

which should consist of only the one commit. But if you actually ran
that, you may notice that the second rev-list takes a long time to run.
In your exact case, one can get lucky by progressively drilling down
into commits and their trees (since the tip commit of "master" happens
to share the identical tree with our new fake commit). But that is
rather an uncommon example, and in more normal cases of fetching from
somebody, building on top, and then pushing back up, it is much more
expensive. In those cases it is much more efficient to walk the small
number of new commits and then expand only their newly-added objects.

If you turn on reachability bitmaps, git _will_ do the thorough set
difference, because it becomes much cheaper to do so. E.g., try:

    git repack -adb

in repo A to build a single pack with bitmaps enabled. Then a subsequent
push should send only a single object (the new commit).

Of course the time spent building the bitmaps is larger than a single
push, so this is not a good strategy if you are just trying to send one
tree.

-Peff

  reply	other threads:[~2015-09-08  5:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-06  8:16 Questions about git-push for huge repositories Levin Du
2015-09-06 17:48 ` Junio C Hamano
2015-09-07  1:05   ` Levin Du
2015-09-07  3:51     ` Levin Du
2015-09-08  1:30       ` Levin Du
2015-09-08  5:44         ` Jeff King [this message]
2015-09-08 18:24           ` Junio C Hamano
2015-09-08 21:54             ` Jeff King
2015-09-08  5:00     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150908054432.GC26331@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=zslevin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).