git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sitaram Chamarty <sitaramc@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	milki <milki@rescomp.berkeley.edu>
Subject: Re: optimising a push by fetching objects from nearby repos
Date: Sun, 11 May 2014 06:34:19 +0530	[thread overview]
Message-ID: <536ECC93.1070102@gmail.com> (raw)
In-Reply-To: <xmqqtx8xuz3b.fsf@gitster.dls.corp.google.com>

On 05/11/2014 02:32 AM, Junio C Hamano wrote:
> Sitaram Chamarty <sitaramc@gmail.com> writes:
>
>> Is there a trick to optimising a push by telling the receiver to pick up
>> missing objects from some other repo on its own server, to cut down even
>> more on network traffic?
>>
>> So, hypothetically,
>>
>>      git push user@host:repo1 --look-for-objects-in=repo2
>>
>> I'm aware of the alternates mechanism, but that makes the dependency on
>> the other repo sort-of permanent.
>
> In the direction of fetching, this may be give a good starting point.
>
>      http://thread.gmane.org/gmane.comp.version-control.git/243918/focus=245397

That's an interesting thread and it's recent too.  However, it's about
clone (though the intro email mentions other commands also).

I'm specifically interested in push efficiency right now.  When you
"fork" someone's repo to your own space, and you push your fork to the
same server, it ought to be able to get most of the common objects from
disk (specifically, from the repo you forked), and only what extra you
did from the network.

Clones do have a workaround (clone with --reference, then repack, as you
said in that thread), but no such workaround exists for push.

> In the direction of pushing, theoretically you could:
>
>   - define a new capability "look-for-objects-in" to pass the name of
>     the repository from "git push" to the "receive-pack";
>
>   - have "receive-pack" temporarily borrow from the named repository
>     (if the policy on the server side allows it), and accept the push;
>
>   - repack in order to dissociate the receiving repository from the
>     other repository it temporarily borrowed from.
>
> which would be the natural inverse of the approach suggested in the
> "Can I borrow just temporarily while cloning?" thread.
>
> But I haven't thought things through with respect to what else need
> to be modified to make sure this does not have adverse interaction
> with simultaneous pushes into the same repository, which would make
> it harder to solve for "receive-pack" than for "clone/fetch".

I'll leave it in your capable hands :-)  My C coding days are long gone!

I do have a way to do this in gitolite (haven't coded it yet; just
thinking).  Gitolite lets you specify something to do before git-*-pack
runs, and I was planning something like this:

terminology: borrow, borrower repo, reference repo

"borrow = relaxed" mode

     1.  check if the user has read access to the reference repo; skip
         the rest of this if he doesn't

     2.  from reference repo's "objects", find all directories and
         "mkdir" them into borrower's objects directory, then find all
         files and "ln" (hardlink) them. This is presumably what "clone
         -l" does.

     This method is close to constant time since we're not copying
     objects.

     It has the potential issue that if an object existed in the
     reference repo that was subsequently *deleted* (say, a commit that
     contained a password, which was quickly overwritten when
     discovered), and the attacker knows the SHA, he can get the commit
     out by sending an commit that depends on it, then fetching it back.

     (He could do that to the reference repo directly if he had write
     access, but we'll assume he doesn't, so this *is* a possible
     attack).

"borrow = strict" mode

     1.  (same as for "relaxed" mode)

     2.  actually *fetch* all refs from the reference repo to the
         borrower (into, say, 'refs/borrowed'), then delete all those
         refs so you just have the objects now.

     Unlike the previous method, this takes time proportional to the
     delta between borrower and reference, and may load the system a bit,
     but unless the reference repo is highly volatile, this will settle
     down. The point is that it cannot be used to get anything that the
     user doesn't already have access to anyway.

I still have to try it, but it sounds like both these would work.

I'd appreciate any comments though...

regards
sitaram

  reply	other threads:[~2014-05-11  1:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-10 13:39 optimising a push by fetching objects from nearby repos Sitaram Chamarty
2014-05-10 13:54 ` Duy Nguyen
2014-05-10 17:23 ` brian m. carlson
2014-05-10 17:32   ` milki
2014-05-10 20:04     ` brian m. carlson
2014-05-10 21:02 ` Junio C Hamano
2014-05-11  1:04   ` Sitaram Chamarty [this message]
2014-05-11  1:34     ` Storm-Olsen, Marius
2014-05-11  2:10       ` Sitaram Chamarty
2014-05-11  3:11         ` Storm-Olsen, Marius
2014-05-11  5:21           ` Sitaram Chamarty
2014-05-11 18:04             ` Junio C Hamano
2014-05-12  1:50               ` Sitaram Chamarty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=536ECC93.1070102@gmail.com \
    --to=sitaramc@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=milki@rescomp.berkeley.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).