All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Josef Wolf <jw@raven.inka.de>
Cc: git@vger.kernel.org
Subject: Re: Re-Transmission of blobs?
Date: Tue, 10 Sep 2013 10:51:02 -0700	[thread overview]
Message-ID: <xmqqsixcy395.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20130910130837.GA14259@raven.wolf.lan> (Josef Wolf's message of "Tue, 10 Sep 2013 15:08:38 +0200")

Josef Wolf <jw@raven.inka.de> writes:

> as we all know, files are identified by their SHA. Thus I had the impression
> that when transfering files, git would know by the SHA whether a given file is
> already available in the destination repository and the transfer would be of
> no use.

That is unfortunately not how things work.  It is not like the
receiving end sends the names of all objects it has, and the sending
end excludes these objects from what it is going to send.

Consider this simple history with only a handful of commits (as
usual, time flows from left to right):

              E
             /   
    A---B---C---D

where D is at the tip of the sending side, E is at the tip of the
receiving side.  The exchange goes roughly like this:

    (receiving side): what do you have?

    (sending side): my tip is at D.

    (receiving side): D?  I've never heard of it --- please give it
                      to me.  I have E.

    (sending side): E?  I don't know about it; must be something you
                    created since you forked from me.  Tell me about
                    its ancestors.

    (receiving side): OK, I have C.

    (sending side): Oh, C I know about. You do not have to tell me
                    anything more.  A packfile to bring you up to
                    date will follow.

At this point, the sender knows that the receiver needs the commit
D, and trees and blobs in D.  It does also know it has the commit C
and trees and blobs in C.  It does the best thing it can do using
these (and only these) information, namely, to send the commit D,
and send trees and blobs in D that are not in the commit C.

You may happen to have something in E that match what is in D but
not in C.  Because the sender does not know anything about E at all
in the first place, that information cannot be used to reduce the
transfer.

The sender theoretically _could_ also exploit the fact that any
receiver that has C must have B and A and all trees and blobs
associated with these ancestor commits [*1*], but that information
is not currently discovered nor used during the object transfer.

There may happen to be a tree or a blob in A that matches a tree or
a blob in D.  But because the common ancestor discovery exchange
above stops at C, the sender does not bother enumerating all the
objects that are in the ancestor commits of C when figuring out what
objects to send to ensure that the receiving end has all the objects
necessary to complete D.  If you modified a blob at B (or C) and
then resurrected the old version of the blob at D, it is likely that
the blob is going to be sent again when the receiving end asks for
D.

There are some work being done to optimize this further using
various techniques, but they are not ready yet.


[Footnote]

*1* only down to the shallow boundary, if the receiving end is a
shallow clone.

  reply	other threads:[~2013-09-10 17:51 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-10 13:08 Re-Transmission of blobs? Josef Wolf
2013-09-10 17:51 ` Junio C Hamano [this message]
2013-09-11 11:27   ` Josef Wolf
2013-09-11 17:14     ` Junio C Hamano
2013-09-12  7:42       ` Josef Wolf
2013-09-12  9:23         ` Jeff King
2013-09-12 10:35           ` Josef Wolf
2013-09-12 19:44             ` Jeff King
2013-09-13 10:09               ` Josef Wolf
2013-09-16 21:55                 ` Jeff King
2013-09-20  9:27                   ` Josef Wolf
2013-09-24  7:36                     ` Jeff King
2013-09-24 20:36                       ` Josef Wolf
2013-09-12 12:45           ` Pyeron, Jason J CTR (US)
2013-09-12 19:56             ` Jeff King
2013-09-12 20:06               ` Pyeron, Jason J CTR (US)
2013-09-13 10:23                 ` Josef Wolf
2013-09-13 11:51                   ` Jason Pyeron
2013-09-13 12:16                 ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqsixcy395.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jw@raven.inka.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.