From: Eric Wong <e@yhbt.net>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: Junio C Hamano <gitster@pobox.com>,
"Theodore Y. Ts'o" <tytso@mit.edu>,
Caleb Gray <hey@calebgray.com>,
git@vger.kernel.org
Subject: Re: Add a "Flattened Cache" to `git --clone`?
Date: Fri, 15 May 2020 21:42:57 +0000 [thread overview]
Message-ID: <20200515214257.GA21855@dcvr> (raw)
In-Reply-To: <20200514214404.bcbjskgi52bwedlh@chatter.i7.local>
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, May 14, 2020 at 02:23:44PM -0700, Junio C Hamano wrote:
> > > I think something like git-caching-proxy would be a neat project,
> > > because it would significantly improve mirroring for CI deployments
> > > without requiring that each individual job implements clone.bundle
> > > prefetching.
> >
> > What are we improving with such a proxy, though?
> >
> > Not bandwidth to the client, apparently.
>
> Well, if it sits in front of the CI subnet, then it *does* save
> bandwidth.
Agreed.
> Here's an example with the exact situation we have:
>
> - the Gerrit server is on the US West Coast
> - the CI builder is on the East Coast
> - each CI job does a full transfer of the multi-MB repo across the
> continent, even when cloning shallow
>
> We solve this by having a local mirror of the repository, but this
> requires active mirroring to be pre-setup. A caching proxy that could:
>
> - receive a request for a repository
> - stream the response back to the client
> - cache objects locally
> - use local cache to construct future requests, so only missing objects
> are fetched from the remote repo regardless of the haves on the actual
> client...
An off-the-shelf HTTP caching proxy (e.g. polipo, Squid) could
do a good enough job with dumb HTTP clones (via GIT_SMART_HTTP=0
env).
With well-packed repos, the dumb HTTP transfer cost shouldn't be
too high (and git 2.10+ got way faster on the client side with
poorly-packed repos, thanks to the Linux kernel-derived list.h).
The occasional full repack on the source git server will
invalidate caches and result in a giant download; but it's
better than no caching at all and doing giant cross-country
transfers all day long.
That said, I'm not sure if any client-side caching proxies can
MITM HTTPS and save bandwidth with HTTPS everywhere, nowadays.
I seem to recall polipo being abandoned because of HTTPS.
Maybe there's a caching HTTPS MITM proxy out there...
next prev parent reply other threads:[~2020-05-15 21:42 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-14 14:34 Add a "Flattened Cache" to `git --clone`? Caleb Gray
2020-05-14 20:33 ` Konstantin Ryabitsev
2020-05-14 20:54 ` Bryan Turner
2020-05-14 21:05 ` Theodore Y. Ts'o
2020-05-14 21:09 ` Eric Sunshine
2020-05-14 21:10 ` Konstantin Ryabitsev
2020-05-14 21:23 ` Junio C Hamano
2020-05-14 21:44 ` Konstantin Ryabitsev
2020-05-15 21:42 ` Eric Wong [this message]
2020-05-17 22:12 ` Konstantin Ryabitsev
[not found] ` <1061511589863147@mail.yandex.ru>
2020-05-25 14:02 ` Caleb Gray
2020-05-14 21:33 ` Caleb Gray
2020-05-14 21:56 ` Junio C Hamano
2020-05-14 22:04 ` Caleb Gray
2020-05-14 22:30 ` Junio C Hamano
2020-05-14 22:44 ` Bryan Turner
2020-05-14 21:19 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200515214257.GA21855@dcvr \
--to=e@yhbt.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=hey@calebgray.com \
--cc=konstantin@linuxfoundation.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.