* [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers
@ 2005-09-13 15:38 Sergey Vlasov
2005-09-13 15:59 ` Junio C Hamano
2005-09-14 17:17 ` sf
0 siblings, 2 replies; 9+ messages in thread
From: Sergey Vlasov @ 2005-09-13 15:38 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano
By default the curl library adds "Pragma: no-cache" header to all
requests, which disables caching by proxy servers. However, most
files in a GIT repository are immutable, and caching them is safe and
could be useful.
This patch removes the "Pragma: no-cache" header from requests for all
files except the pack list (objects/info/packs) and references
(refs/*), which are really mutable and should not be cached.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
---
http-fetch.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
fd3f39120636d1ef8834845aa80475c1664a3a3e
diff --git a/http-fetch.c b/http-fetch.c
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -14,6 +14,7 @@
#endif
static CURL *curl;
+static struct curl_slist *no_pragma_header;
static char *base;
@@ -102,6 +103,7 @@ static int fetch_index(unsigned char *sh
curl_easy_setopt(curl, CURLOPT_FILE, indexfile);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite);
curl_easy_setopt(curl, CURLOPT_URL, url);
+ curl_easy_setopt(curl, CURLOPT_HTTPHEADER, no_pragma_header);
if (curl_easy_perform(curl)) {
fclose(indexfile);
@@ -152,6 +154,7 @@ static int fetch_indices(void)
curl_easy_setopt(curl, CURLOPT_FILE, &buffer);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite_buffer);
curl_easy_setopt(curl, CURLOPT_URL, url);
+ curl_easy_setopt(curl, CURLOPT_HTTPHEADER, NULL);
if (curl_easy_perform(curl)) {
return error("Unable to get pack index %s", url);
@@ -215,6 +218,7 @@ static int fetch_pack(unsigned char *sha
curl_easy_setopt(curl, CURLOPT_FILE, packfile);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite);
curl_easy_setopt(curl, CURLOPT_URL, url);
+ curl_easy_setopt(curl, CURLOPT_HTTPHEADER, no_pragma_header);
if (curl_easy_perform(curl)) {
fclose(packfile);
@@ -255,6 +259,7 @@ int fetch(unsigned char *sha1)
curl_easy_setopt(curl, CURLOPT_FAILONERROR, 1);
curl_easy_setopt(curl, CURLOPT_FILE, NULL);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite_sha1_file);
+ curl_easy_setopt(curl, CURLOPT_HTTPHEADER, no_pragma_header);
url = xmalloc(strlen(base) + 50);
strcpy(url, base);
@@ -303,6 +308,7 @@ int fetch_ref(char *ref, unsigned char *
curl_easy_setopt(curl, CURLOPT_FILE, &buffer);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite_buffer);
+ curl_easy_setopt(curl, CURLOPT_HTTPHEADER, NULL);
url = xmalloc(strlen(base) + 6 + strlen(ref));
strcpy(url, base);
@@ -354,6 +360,7 @@ int main(int argc, char **argv)
curl_global_init(CURL_GLOBAL_ALL);
curl = curl_easy_init();
+ no_pragma_header = curl_slist_append(no_pragma_header, "Pragma:");
curl_ssl_verify = getenv("GIT_SSL_NO_VERIFY") ? 0 : 1;
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, curl_ssl_verify);
@@ -366,6 +373,7 @@ int main(int argc, char **argv)
if (pull(commit_id))
return 1;
+ curl_slist_free_all(no_pragma_header);
curl_global_cleanup();
return 0;
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers
2005-09-13 15:38 [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers Sergey Vlasov
@ 2005-09-13 15:59 ` Junio C Hamano
2005-09-14 13:12 ` Sergey Vlasov
2005-09-14 17:17 ` sf
1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2005-09-13 15:59 UTC (permalink / raw)
To: Sergey Vlasov; +Cc: git
Sergey Vlasov <vsu@altlinux.ru> writes:
> This patch removes the "Pragma: no-cache" header from requests for all
> files except the pack list (objects/info/packs) and references
> (refs/*), which are really mutable and should not be cached.
Thanks. What the patch does looks reasonable.
Do you know if we can use it for any reasonably recent version
of curl? I seem to recall we already do things slightly
differently depending on LIBCURL_VERSION_NUM.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers
2005-09-13 15:59 ` Junio C Hamano
@ 2005-09-14 13:12 ` Sergey Vlasov
2005-09-14 16:28 ` Junio C Hamano
0 siblings, 1 reply; 9+ messages in thread
From: Sergey Vlasov @ 2005-09-14 13:12 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 847 bytes --]
On Tue, Sep 13, 2005 at 08:59:26AM -0700, Junio C Hamano wrote:
> Sergey Vlasov <vsu@altlinux.ru> writes:
>
> > This patch removes the "Pragma: no-cache" header from requests for all
> > files except the pack list (objects/info/packs) and references
> > (refs/*), which are really mutable and should not be cached.
>
> Thanks. What the patch does looks reasonable.
>
> Do you know if we can use it for any reasonably recent version
> of curl? I seem to recall we already do things slightly
> differently depending on LIBCURL_VERSION_NUM.
http://cool.haxx.se/cvs.cgi/curl/include/curl/curl.h?rev=1.1&content-type=text/vnd.viewcvs-markup
shows that CURLOPT_HTTPHEADER, curl_slist_append() and
curl_slist_free_all() were available in Dec 1999, curl 6.3.1. The FAQ
entry about disabling "Pragma: no-cache" is from Aug 2, 2000.
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers
2005-09-14 13:12 ` Sergey Vlasov
@ 2005-09-14 16:28 ` Junio C Hamano
0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2005-09-14 16:28 UTC (permalink / raw)
To: Sergey Vlasov; +Cc: git
Sergey Vlasov <vsu@altlinux.ru> writes:
>> Do you know if we can use it for any reasonably recent version
>> of curl? I seem to recall we already do things slightly
>> differently depending on LIBCURL_VERSION_NUM.
>
> http://cool.haxx.se/cvs.cgi/curl/include/curl/curl.h?rev=1.1&content-type=text/vnd.viewcvs-markup
> shows...
Cool. I'll not worry about version dependency for this one,
then.
Thanks for the pointer -- I really appreciate it when people
teach others how to find out what was asked themselves next
time need arises.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers
2005-09-13 15:38 [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers Sergey Vlasov
2005-09-13 15:59 ` Junio C Hamano
@ 2005-09-14 17:17 ` sf
2005-09-19 0:23 ` [PATCH] git-http-fetch: Allow caching of retrieved objects byproxy servers David Lang
1 sibling, 1 reply; 9+ messages in thread
From: sf @ 2005-09-14 17:17 UTC (permalink / raw)
To: git
Sergey Vlasov wrote:
> By default the curl library adds "Pragma: no-cache" header to all
> requests, which disables caching by proxy servers. However, most
> files in a GIT repository are immutable, and caching them is safe and
> could be useful.
Is caching really safe? Because of compression one git object can have
many file representations.
Regards
Stephan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects byproxy servers
2005-09-14 17:17 ` sf
@ 2005-09-19 0:23 ` David Lang
2005-09-19 10:24 ` sf
0 siblings, 1 reply; 9+ messages in thread
From: David Lang @ 2005-09-19 0:23 UTC (permalink / raw)
To: sf-git; +Cc: git
On Wed, 14 Sep 2005, sf wrote:
> Sergey Vlasov wrote:
>> By default the curl library adds "Pragma: no-cache" header to all
>> requests, which disables caching by proxy servers. However, most
>> files in a GIT repository are immutable, and caching them is safe and
>> could be useful.
>
> Is caching really safe? Because of compression one git object can have many
> file representations.
only if you use different compression algorithums.
remember that git objects are identified by their sha1, if the sha1 is
what you want (and the file matches the sha1 after you decompress it) then
it really doesn't matter what it's on-disk representation is.
David Lang
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects byproxy servers
2005-09-19 0:23 ` [PATCH] git-http-fetch: Allow caching of retrieved objects byproxy servers David Lang
@ 2005-09-19 10:24 ` sf
2005-09-19 13:35 ` Petr Baudis
0 siblings, 1 reply; 9+ messages in thread
From: sf @ 2005-09-19 10:24 UTC (permalink / raw)
To: git
David Lang wrote:
> On Wed, 14 Sep 2005, sf wrote:
>
>> Sergey Vlasov wrote:
>>
>>> By default the curl library adds "Pragma: no-cache" header to all
>>> requests, which disables caching by proxy servers. However, most
>>> files in a GIT repository are immutable, and caching them is safe and
>>> could be useful.
>>
>>
>> Is caching really safe? Because of compression one git object can have
>> many file representations.
>
>
> only if you use different compression algorithums.
Even different implementations and different compression levels can lead
to different file representations.
> remember that git objects are identified by their sha1, if the sha1 is
> what you want (and the file matches the sha1 after you decompress it)
> then it really doesn't matter what it's on-disk representation is.
You are arguing on the git tool level but we are talking about HTTP
which knows nothing about the uncompressed sha1.
The OP assumed that "files in a GIT repository are immutable" which is
not true. If you consider the sequence
pack -> prune -> update zlib or git -> unpack
you can end up with different files if the new zlib implementation
changes imcompatibly (with respect to byte-by-byte compression results)
or if git suddenly does not use the default compression level any more.
And surely in the future there will be other git implementations than
this one which may not even use zlib.
I do not say that caching is not possible at all but HTTP caching has
its pitfalls so just be careful.
Regards
Stephan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects byproxy servers
2005-09-19 10:24 ` sf
@ 2005-09-19 13:35 ` Petr Baudis
2005-09-20 11:34 ` sf
0 siblings, 1 reply; 9+ messages in thread
From: Petr Baudis @ 2005-09-19 13:35 UTC (permalink / raw)
To: sf; +Cc: git
Dear diary, on Mon, Sep 19, 2005 at 12:24:45PM CEST, I got a letter
where sf <sf@b-i-t.de> told me that...
> >remember that git objects are identified by their sha1, if the sha1 is
> >what you want (and the file matches the sha1 after you decompress it)
> >then it really doesn't matter what it's on-disk representation is.
>
> You are arguing on the git tool level but we are talking about HTTP
> which knows nothing about the uncompressed sha1.
>
> The OP assumed that "files in a GIT repository are immutable" which is
> not true. If you consider the sequence
>
> pack -> prune -> update zlib or git -> unpack
>
> you can end up with different files if the new zlib implementation
> changes imcompatibly (with respect to byte-by-byte compression results)
> or if git suddenly does not use the default compression level any more.
Yes, but why should this matter? It shouldn't matter if you get the old
"version" or the new version of the file over HTTP, the actual object's
contents is still the same, and GIT shouldn't care.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
If you want the holes in your knowledge showing up try teaching
someone. -- Alan Cox
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects byproxy servers
2005-09-19 13:35 ` Petr Baudis
@ 2005-09-20 11:34 ` sf
0 siblings, 0 replies; 9+ messages in thread
From: sf @ 2005-09-20 11:34 UTC (permalink / raw)
To: git
Petr Baudis wrote:
> Dear diary, on Mon, Sep 19, 2005 at 12:24:45PM CEST, I got a letter
> where sf <sf@b-i-t.de> told me that...
...
>> The OP assumed that "files in a GIT repository are immutable" which is
>> not true. If you consider the sequence
>>
>> pack -> prune -> update zlib or git -> unpack
>>
>> you can end up with different files if the new zlib implementation
>> changes imcompatibly (with respect to byte-by-byte compression results)
>> or if git suddenly does not use the default compression level any more.
>
> Yes, but why should this matter? It shouldn't matter if you get the old
> "version" or the new version of the file over HTTP, the actual object's
> contents is still the same, and GIT shouldn't care.
>
This is correct as long as you take care to always get each file in one go.
Recently there was talk about how git handles objects larger than 4GB.
But you do not have to go this far. Think about fetching 1MB (or 10MB or
100MB) compressed objects over a slow link. If the transfer gets
interrupted some people or some clever piece of software - perhaps even
in git-core - might try to continue the interrupted download. Now if the
file representation has changed in the meantime the downloaded file is
going to be corrupt.
The git tools will of course take note of the corruption but then the
head scratching begins: "What went wrong?"
The more I think about this I realize that my worries have nothing to do
with caching but with HTTP fetching in general.
Regards
Stephan
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2005-09-20 11:38 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-13 15:38 [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers Sergey Vlasov
2005-09-13 15:59 ` Junio C Hamano
2005-09-14 13:12 ` Sergey Vlasov
2005-09-14 16:28 ` Junio C Hamano
2005-09-14 17:17 ` sf
2005-09-19 0:23 ` [PATCH] git-http-fetch: Allow caching of retrieved objects byproxy servers David Lang
2005-09-19 10:24 ` sf
2005-09-19 13:35 ` Petr Baudis
2005-09-20 11:34 ` sf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).