git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "Pyeron, Jason J CTR (US)" <jason.j.pyeron.ctr@mail.mil>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	Josef Wolf <jw@raven.inka.de>
Subject: Re: Re-Transmission of blobs?
Date: Thu, 12 Sep 2013 15:56:54 -0400	[thread overview]
Message-ID: <20130912195654.GE32069@sigill.intra.peff.net> (raw)
In-Reply-To: <871B6C10EBEFE342A772D1159D132085571A7A1B@umechphj.easf.csd.disa.mil>

On Thu, Sep 12, 2013 at 12:45:44PM +0000, Pyeron, Jason J CTR (US) wrote:

> If the rules of engagement are change a bit, the server side can be release from most of its work (CPU/IO).
> 
> Client does the following, looping as needed:
> 
> Heads=server->heads();
> KnownCommits=Local->AllCommits();
> Missingblobs=[];
> Foreach(commit:heads) if (!knownCommits->contains(commit)) MissingBlobs[]=commit;
> Foreach(commit:knownCommit) if (!commit->isValid()) MissingBlobs[]=commit->blobs();
> If (missingBlobs->size()>0) server->FetchBlobs(missingBlobs);

That doesn't quite work. The client does not know the set of missing
objects just from the commits. It knows the sha1 of the root trees it is
missing. And then if it fetches those, it knows the sha1 of any
top-level entries it is missing. And when it gets those, it knows the
sha1 of any 2nd-level entries it is missing, and so forth.

You can progressively ask for each level, but:

  1. You are spending a round-trip for each request. Doing it per-object
     is awful (the dumb http walker will do this if the repo is not
     packed, and it's S-L-O-W). Doing it per-level would be better, but
     not great.

  2. You are losing opportunities for deltas (or you are making the
     state the server needs to maintain very complicated, as it must
     remember from request to request which objects you have gotten that
     can be used as delta bases).

  3. There is a lot of overhead in this protocol. The client has to
     mention each object individually by sha1. It may not seem like a
     lot, but it can easily add 10% to a clone (just look at the size of
     the pack .idx files versus the packfiles themselves).

-Peff

  reply	other threads:[~2013-09-12 19:57 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-10 13:08 Re-Transmission of blobs? Josef Wolf
2013-09-10 17:51 ` Junio C Hamano
2013-09-11 11:27   ` Josef Wolf
2013-09-11 17:14     ` Junio C Hamano
2013-09-12  7:42       ` Josef Wolf
2013-09-12  9:23         ` Jeff King
2013-09-12 10:35           ` Josef Wolf
2013-09-12 19:44             ` Jeff King
2013-09-13 10:09               ` Josef Wolf
2013-09-16 21:55                 ` Jeff King
2013-09-20  9:27                   ` Josef Wolf
2013-09-24  7:36                     ` Jeff King
2013-09-24 20:36                       ` Josef Wolf
2013-09-12 12:45           ` Pyeron, Jason J CTR (US)
2013-09-12 19:56             ` Jeff King [this message]
2013-09-12 20:06               ` Pyeron, Jason J CTR (US)
2013-09-13 10:23                 ` Josef Wolf
2013-09-13 11:51                   ` Jason Pyeron
2013-09-13 12:16                 ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130912195654.GE32069@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jason.j.pyeron.ctr@mail.mil \
    --cc=jw@raven.inka.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).