All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: Shawn Pearce <spearce@spearce.org>, git <git@vger.kernel.org>
Subject: Re: RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP
Date: Wed, 10 Feb 2016 13:01:15 -0800	[thread overview]
Message-ID: <20160210210115.GA10155@google.com> (raw)
In-Reply-To: <CAGZ79kZMvxa5Np4GbShv_A6NZwVAqff94+d8MFTZwrZS+2CqeQ@mail.gmail.com>

Stefan Beller wrote:
> On Wed, Feb 10, 2016 at 12:11 PM, Shawn Pearce <spearce@spearce.org> wrote:

>> Several of us at $DAY_JOB talked about this more today and thought a
>> variation makes more sense:
>>
>> 1. Clients attempting clone ask for /info/refs?service=git-upload-pack
>> like they do today.
>>
>> 2. Servers that support resumable clone include a "resumable"
>> capability in the advertisement.
>
> like "resumable-token=hash" similar to a push cert advertisement?

It could just be the string 'resumable'.

But I wonder if it would be possible to save a round-trip by getting the
302 response in the initial request.  If the client requests

	/info/refs?service=git-upload-pack&want_resumable=true

then allow the server to make a 302 in response to its current mostly
whole pack.  Current clients would never send such a request because the
current protocol requires that for smart clients

	The request MUST contain exactly one query parameter,
	`service=$servicename`, where `$servicename` MUST be the service
	name the client wishes to contact to complete the operation.
	The request MUST NOT contain additional query parameters.

Current http-backend ignores extra query parameters.  I haven't
checked other smart http server implementations, though.

>> 3. Updated clients on clone request GET /info/refs?service=git-resumable-clone.
>
> Or just in the non-http case, they would terminate after the ls-remote
> (including capability advertisement) was done and connect again to
> a different service such as git-upload-stale-pack with the resumable
> token to identify the pack.

HTTP supports range requests and existing CDNs speak HTTP, so I
suspect it would work better if the git-resumable-clone service
printed an HTTP URL from which to grab the packfile.

I think the details are something that could be figured out after
trying out the idea with http first, though.

[...]
>> 5. Clients fetch the file using standard HTTP GET, possibly with
>> byte-ranges to resume.
>
> In the non-http case the git-upload-stale-pack would be rsync with the
> resume token to determine the file name of the pack,
> such that we have resumeability.

How do I tunnel rsync over git protocol?

So I think in the non-http case the git-resumable-clone service would
have to print a URL to be served using a possibly different protocol
(e.g., a signed https URL for getting the file from a service like S3,
or an rsync URL for getting the file using the same ssh creds that
were used for the initial request).

[...]
>> 6. Once stored and indexed with .idx, clients run `git fsck
>> --lost-found` to discover the roots of the pack it downloaded. These
>> are saved as temporary references.
>
> jrn:
> > I suspect we can do even faster by making index-pack do the work
>
>     index-pack --check-self-contained-and-connected

--strict + --check-self-contained-and-connected check that the pack
is self-contained.  In the process they mark each object that is
reachable from another object in the pack with FLAG_LINK.

The objects not marked with FLAG_LINK are the roots.

[...]
>> To make step 4 really resume well, clients may need to save the first
>> Location header it gets back from
>> /info/refs?service=git-resumable-clone and use that on resume. Servers
>> are likely to embed the pack SHA-1 in the Location header, and the
>> client wants to use this on subsequent GET attempts to abort early if
>> the server has deleted the pack the client is trying to obtain.

Yes.

I really like this design.  I'm tempted to implement it (since it
lacks a bunch of the downsides of clone.bundle).

Thanks,
Jonathan

  parent reply	other threads:[~2016-02-10 21:01 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-10 18:59 RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP Shawn Pearce
2016-02-10 20:11 ` Shawn Pearce
2016-02-10 20:23   ` Stefan Beller
2016-02-10 20:57     ` Junio C Hamano
2016-02-10 21:22       ` Jonathan Nieder
2016-02-10 22:03         ` Jeff King
2016-02-10 21:01     ` Jonathan Nieder [this message]
2016-02-10 21:07       ` Junio C Hamano
2016-02-11  3:43       ` Junio C Hamano
2016-02-11 18:04         ` Shawn Pearce
2016-02-11 23:53       ` Duy Nguyen
2016-02-13  5:07         ` Junio C Hamano
2016-02-10 21:49   ` Jeff King
2016-02-10 22:17     ` Jonathan Nieder
2016-02-10 23:03       ` Jeff King
2016-02-10 22:40     ` Junio C Hamano
2016-02-11 21:32     ` Junio C Hamano
2016-02-11 21:46       ` Jeff King
2016-02-13  1:40     ` Blake Burkhart
2016-02-13 17:00       ` Jeff King
2016-02-14  2:14     ` Shawn Pearce
2016-02-14 17:05       ` Jeff King
2016-02-14 17:56         ` Shawn Pearce
2016-02-16 18:34         ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160210210115.GA10155@google.com \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.