git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Levin Du <zslevin@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: Questions about git-push for huge repositories
Date: Tue, 8 Sep 2015 01:00:27 -0400	[thread overview]
Message-ID: <20150908050027.GB26331@sigill.intra.peff.net> (raw)
In-Reply-To: <CAN6cQGMf089ERn2kZbFpHJ6vyJ4BnjCm-m-E+hQsduH55XFvKw@mail.gmail.com>

On Mon, Sep 07, 2015 at 09:05:41AM +0800, Levin Du wrote:

> > Instead, the object transfer is optimized by comparing what commits
> > each side has and sending trees and blobs that are reachable from
> > the commits that the receiving side does not have.
> 
> The sender A sends all the commits that the receiver B does not have.
> The commits contains trees and blobs. In my situation, branch in A has
> only one commit. It seems that B has received lots of duplicate blobs,
> concluded from the GC result.

Right. B tells A "I already have this commit", but A does not already
have it, so that information is not helpful. It cannot make any
assumptions about what B has, and must send all trees and blobs
referenced by its commit.

> What I do not understand is, how duplicate blobs happen in a git repository?
> Git repository is famous for its content addressing storage system.
> I guess that A sends its packed file to B directly, no matter what are
> already in B.

Not exactly.  During a push, git may or may not keep the packfile sent
over the wire, depending on the number of objects in it and the
receive.unpackLimit config setting. The same object can exist in two
separate packfiles. One of the effects of "git gc" is to remove such
duplicates.

So A effectively does send its whole pack in this case, but only because
it cannot find any shared history with B (and B keeps it as-is until the
next gc because it is over the unpackLimit).

-Peff

      parent reply	other threads:[~2015-09-08  5:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-06  8:16 Questions about git-push for huge repositories Levin Du
2015-09-06 17:48 ` Junio C Hamano
2015-09-07  1:05   ` Levin Du
2015-09-07  3:51     ` Levin Du
2015-09-08  1:30       ` Levin Du
2015-09-08  5:44         ` Jeff King
2015-09-08 18:24           ` Junio C Hamano
2015-09-08 21:54             ` Jeff King
2015-09-08  5:00     ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150908050027.GB26331@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=zslevin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).