git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Wolf <jw@raven.inka.de>
To: git@vger.kernel.org
Subject: Re: Re-Transmission of blobs?
Date: Fri, 20 Sep 2013 11:27:15 +0200	[thread overview]
Message-ID: <20130920092715.GG14259@raven.wolf.lan> (raw)
In-Reply-To: <20130916215536.GB5477@sigill.intra.peff.net>

On Mon, Sep 16, 2013 at 05:55:36PM -0400, Jeff King wrote:
> On Fri, Sep 13, 2013 at 12:09:35PM +0200, Josef Wolf wrote:
> 
> > > > I'm not sure I understand correctly. I see that bitmaps can be used to
> > > > implement set operations. But how comes that walking the graph requires a lot
> > > > of CPU? Isn't it O(n)?
> > > 
> > > Yes and no. Your "n" there is the entirety of history.
> > 
> > Is this really true?
> 
> Yes. If you know that the receiver has commit X, and you want to know if
> it has some blob Y, the only way to know for sure is to look at every
> tree of every commit reachable from X, and see whether any of them
> references Y.

Jeff, in my original example, I did a cherry-pick from origin/somebranch.
Even without asking, we can assume with great probability that
origin/somebranch is available at origin. And the file in question happens to
reside in the tree at the very tip of origin/somebranch, not in some of its
ancestors. In this case, there's no need to search the history at all. And
even in this pretty simple case, the algorithm seems to fail for some reason.

> Yes, you do not have to recurse into sub-trees (or commits) you have
> already seen. And we already do that optimization.

Why is the file re-transmitted, then?


With a little change in the protocol, a very simple optimization could be
implemented, avoiding the complicated bitmap strategy you were talking about:

Please consider Junio's description of the dialog:

[ Junio wrote: ]
> Consider this simple history with only a handful of commits (as
> usual, time flows from left to right):
>
>              E
>             /   
>    A---B---C---D
>
> where D is at the tip of the sending side, E is at the tip of the
> receiving side.  The exchange goes roughly like this:
>
>    (receiving side): what do you have?
>
>    (sending side): my tip is at D.
>
>    (receiving side): D?  I've never heard of it --- please give it
>                      to me.  I have E.
>
>    (sending side): E?  I don't know about it; must be something you
>                    created since you forked from me.  Tell me about
>                    its ancestors.
>
>    (receiving side): OK, I have C.
>
>    (sending side): Oh, C I know about. You do not have to tell me
>                    anything more.  A packfile to bring you up to
>                    date will follow.

In the last step, instead of sending a packfile, the sending side should send
a list of the SHA's which would be included in this packfile. The receiving
side would then be able to request all the objects it needs to get up-to-date.

I think this change would be considerably simpler than the reachability bitmap
you are talking about. And it would avoid all those time consuming traversals
through the history and the tree. And it would omit _all_ redundant
retransmissions. Even in the case when sender and receiver do not have _any_
common heads at all, _no_ files at all would be retransmitted unnecessarily.

-- 
Josef Wolf
jw@raven.inka.de

  reply	other threads:[~2013-09-20  9:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-10 13:08 Re-Transmission of blobs? Josef Wolf
2013-09-10 17:51 ` Junio C Hamano
2013-09-11 11:27   ` Josef Wolf
2013-09-11 17:14     ` Junio C Hamano
2013-09-12  7:42       ` Josef Wolf
2013-09-12  9:23         ` Jeff King
2013-09-12 10:35           ` Josef Wolf
2013-09-12 19:44             ` Jeff King
2013-09-13 10:09               ` Josef Wolf
2013-09-16 21:55                 ` Jeff King
2013-09-20  9:27                   ` Josef Wolf [this message]
2013-09-24  7:36                     ` Jeff King
2013-09-24 20:36                       ` Josef Wolf
2013-09-12 12:45           ` Pyeron, Jason J CTR (US)
2013-09-12 19:56             ` Jeff King
2013-09-12 20:06               ` Pyeron, Jason J CTR (US)
2013-09-13 10:23                 ` Josef Wolf
2013-09-13 11:51                   ` Jason Pyeron
2013-09-13 12:16                 ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130920092715.GG14259@raven.wolf.lan \
    --to=jw@raven.inka.de \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).