From: Jeff King <peff@peff.net>
To: "Pyeron, Jason J CTR (US)" <jason.j.pyeron.ctr@mail.mil>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
Josef Wolf <jw@raven.inka.de>
Subject: Re: Re-Transmission of blobs?
Date: Thu, 12 Sep 2013 15:56:54 -0400 [thread overview]
Message-ID: <20130912195654.GE32069@sigill.intra.peff.net> (raw)
In-Reply-To: <871B6C10EBEFE342A772D1159D132085571A7A1B@umechphj.easf.csd.disa.mil>
On Thu, Sep 12, 2013 at 12:45:44PM +0000, Pyeron, Jason J CTR (US) wrote:
> If the rules of engagement are change a bit, the server side can be release from most of its work (CPU/IO).
>
> Client does the following, looping as needed:
>
> Heads=server->heads();
> KnownCommits=Local->AllCommits();
> Missingblobs=[];
> Foreach(commit:heads) if (!knownCommits->contains(commit)) MissingBlobs[]=commit;
> Foreach(commit:knownCommit) if (!commit->isValid()) MissingBlobs[]=commit->blobs();
> If (missingBlobs->size()>0) server->FetchBlobs(missingBlobs);
That doesn't quite work. The client does not know the set of missing
objects just from the commits. It knows the sha1 of the root trees it is
missing. And then if it fetches those, it knows the sha1 of any
top-level entries it is missing. And when it gets those, it knows the
sha1 of any 2nd-level entries it is missing, and so forth.
You can progressively ask for each level, but:
1. You are spending a round-trip for each request. Doing it per-object
is awful (the dumb http walker will do this if the repo is not
packed, and it's S-L-O-W). Doing it per-level would be better, but
not great.
2. You are losing opportunities for deltas (or you are making the
state the server needs to maintain very complicated, as it must
remember from request to request which objects you have gotten that
can be used as delta bases).
3. There is a lot of overhead in this protocol. The client has to
mention each object individually by sha1. It may not seem like a
lot, but it can easily add 10% to a clone (just look at the size of
the pack .idx files versus the packfiles themselves).
-Peff
next prev parent reply other threads:[~2013-09-12 19:57 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-10 13:08 Re-Transmission of blobs? Josef Wolf
2013-09-10 17:51 ` Junio C Hamano
2013-09-11 11:27 ` Josef Wolf
2013-09-11 17:14 ` Junio C Hamano
2013-09-12 7:42 ` Josef Wolf
2013-09-12 9:23 ` Jeff King
2013-09-12 10:35 ` Josef Wolf
2013-09-12 19:44 ` Jeff King
2013-09-13 10:09 ` Josef Wolf
2013-09-16 21:55 ` Jeff King
2013-09-20 9:27 ` Josef Wolf
2013-09-24 7:36 ` Jeff King
2013-09-24 20:36 ` Josef Wolf
2013-09-12 12:45 ` Pyeron, Jason J CTR (US)
2013-09-12 19:56 ` Jeff King [this message]
2013-09-12 20:06 ` Pyeron, Jason J CTR (US)
2013-09-13 10:23 ` Josef Wolf
2013-09-13 11:51 ` Jason Pyeron
2013-09-13 12:16 ` Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130912195654.GE32069@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=jason.j.pyeron.ctr@mail.mil \
--cc=jw@raven.inka.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).