git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Wong <e@yhbt.net>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: Junio C Hamano <gitster@pobox.com>,
	"Theodore Y. Ts'o" <tytso@mit.edu>,
	Caleb Gray <hey@calebgray.com>,
	git@vger.kernel.org
Subject: Re: Add a "Flattened Cache" to `git --clone`?
Date: Fri, 15 May 2020 21:42:57 +0000	[thread overview]
Message-ID: <20200515214257.GA21855@dcvr> (raw)
In-Reply-To: <20200514214404.bcbjskgi52bwedlh@chatter.i7.local>

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, May 14, 2020 at 02:23:44PM -0700, Junio C Hamano wrote:
> > > I think something like git-caching-proxy would be a neat project, 
> > > because it would significantly improve mirroring for CI deployments 
> > > without requiring that each individual job implements clone.bundle 
> > > prefetching.
> > 
> > What are we improving with such a proxy, though?
> > 
> > Not bandwidth to the client, apparently. 
> 
> Well, if it sits in front of the CI subnet, then it *does* save 
> bandwidth.

Agreed.

> Here's an example with the exact situation we have:
> 
> - the Gerrit server is on the US West Coast
> - the CI builder is on the East Coast
> - each CI job does a full transfer of the multi-MB repo across the 
>   continent, even when cloning shallow
> 
> We solve this by having a local mirror of the repository, but this 
> requires active mirroring to be pre-setup. A caching proxy that could:
> 
> - receive a request for a repository
> - stream the response back to the client
> - cache objects locally
> - use local cache to construct future requests, so only missing objects 
>   are fetched from the remote repo regardless of the haves on the actual 
>   client...

An off-the-shelf HTTP caching proxy (e.g. polipo, Squid) could
do a good enough job with dumb HTTP clones (via GIT_SMART_HTTP=0
env).

With well-packed repos, the dumb HTTP transfer cost shouldn't be
too high (and git 2.10+ got way faster on the client side with
poorly-packed repos, thanks to the Linux kernel-derived list.h).

The occasional full repack on the source git server will
invalidate caches and result in a giant download; but it's
better than no caching at all and doing giant cross-country
transfers all day long.

That said, I'm not sure if any client-side caching proxies can
MITM HTTPS and save bandwidth with HTTPS everywhere, nowadays.
I seem to recall polipo being abandoned because of HTTPS.
Maybe there's a caching HTTPS MITM proxy out there...

  reply	other threads:[~2020-05-15 21:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-14 14:34 Add a "Flattened Cache" to `git --clone`? Caleb Gray
2020-05-14 20:33 ` Konstantin Ryabitsev
2020-05-14 20:54   ` Bryan Turner
2020-05-14 21:05   ` Theodore Y. Ts'o
2020-05-14 21:09     ` Eric Sunshine
2020-05-14 21:10     ` Konstantin Ryabitsev
2020-05-14 21:23       ` Junio C Hamano
2020-05-14 21:44         ` Konstantin Ryabitsev
2020-05-15 21:42           ` Eric Wong [this message]
2020-05-17 22:12             ` Konstantin Ryabitsev
     [not found]               ` <1061511589863147@mail.yandex.ru>
2020-05-25 14:02                 ` Caleb Gray
2020-05-14 21:33     ` Caleb Gray
2020-05-14 21:56       ` Junio C Hamano
2020-05-14 22:04         ` Caleb Gray
2020-05-14 22:30           ` Junio C Hamano
2020-05-14 22:44           ` Bryan Turner
2020-05-14 21:19   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200515214257.GA21855@dcvr \
    --to=e@yhbt.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hey@calebgray.com \
    --cc=konstantin@linuxfoundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).