From: Eric Wong <e@yhbt.net>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: Junio C Hamano <gitster@pobox.com>,
"Theodore Y. Ts'o" <tytso@mit.edu>,
Caleb Gray <hey@calebgray.com>,
git@vger.kernel.org
Subject: Re: Add a "Flattened Cache" to `git --clone`?
Date: Fri, 15 May 2020 21:42:57 +0000 [thread overview]
Message-ID: <20200515214257.GA21855@dcvr> (raw)
In-Reply-To: <20200514214404.bcbjskgi52bwedlh@chatter.i7.local>
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, May 14, 2020 at 02:23:44PM -0700, Junio C Hamano wrote:
> > > I think something like git-caching-proxy would be a neat project,
> > > because it would significantly improve mirroring for CI deployments
> > > without requiring that each individual job implements clone.bundle
> > > prefetching.
> >
> > What are we improving with such a proxy, though?
> >
> > Not bandwidth to the client, apparently.
>
> Well, if it sits in front of the CI subnet, then it *does* save
> bandwidth.
Agreed.
> Here's an example with the exact situation we have:
>
> - the Gerrit server is on the US West Coast
> - the CI builder is on the East Coast
> - each CI job does a full transfer of the multi-MB repo across the
> continent, even when cloning shallow
>
> We solve this by having a local mirror of the repository, but this
> requires active mirroring to be pre-setup. A caching proxy that could:
>
> - receive a request for a repository
> - stream the response back to the client
> - cache objects locally
> - use local cache to construct future requests, so only missing objects
> are fetched from the remote repo regardless of the haves on the actual
> client...
An off-the-shelf HTTP caching proxy (e.g. polipo, Squid) could
do a good enough job with dumb HTTP clones (via GIT_SMART_HTTP=0
env).
With well-packed repos, the dumb HTTP transfer cost shouldn't be
too high (and git 2.10+ got way faster on the client side with
poorly-packed repos, thanks to the Linux kernel-derived list.h).
The occasional full repack on the source git server will
invalidate caches and result in a giant download; but it's
better than no caching at all and doing giant cross-country
transfers all day long.
That said, I'm not sure if any client-side caching proxies can
MITM HTTPS and save bandwidth with HTTPS everywhere, nowadays.
I seem to recall polipo being abandoned because of HTTPS.
Maybe there's a caching HTTPS MITM proxy out there...
next prev parent reply other threads:[~2020-05-15 21:42 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-14 14:34 Add a "Flattened Cache" to `git --clone`? Caleb Gray
2020-05-14 20:33 ` Konstantin Ryabitsev
2020-05-14 20:54 ` Bryan Turner
2020-05-14 21:05 ` Theodore Y. Ts'o
2020-05-14 21:09 ` Eric Sunshine
2020-05-14 21:10 ` Konstantin Ryabitsev
2020-05-14 21:23 ` Junio C Hamano
2020-05-14 21:44 ` Konstantin Ryabitsev
2020-05-15 21:42 ` Eric Wong [this message]
2020-05-17 22:12 ` Konstantin Ryabitsev
[not found] ` <1061511589863147@mail.yandex.ru>
2020-05-25 14:02 ` Caleb Gray
2020-05-14 21:33 ` Caleb Gray
2020-05-14 21:56 ` Junio C Hamano
2020-05-14 22:04 ` Caleb Gray
2020-05-14 22:30 ` Junio C Hamano
2020-05-14 22:44 ` Bryan Turner
2020-05-14 21:19 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200515214257.GA21855@dcvr \
--to=e@yhbt.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=hey@calebgray.com \
--cc=konstantin@linuxfoundation.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).