[PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers
@ 2005-09-13 15:38 Sergey Vlasov
  2005-09-13 15:59 ` Junio C Hamano
  2005-09-14 17:17 ` sf
  0 siblings, 2 replies; 9+ messages in thread
From: Sergey Vlasov @ 2005-09-13 15:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

By default the curl library adds "Pragma: no-cache" header to all
requests, which disables caching by proxy servers.  However, most
files in a GIT repository are immutable, and caching them is safe and
could be useful.

This patch removes the "Pragma: no-cache" header from requests for all
files except the pack list (objects/info/packs) and references
(refs/*), which are really mutable and should not be cached.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>


---

 http-fetch.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

fd3f39120636d1ef8834845aa80475c1664a3a3e
diff --git a/http-fetch.c b/http-fetch.c
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -14,6 +14,7 @@
 #endif
 
 static CURL *curl;
+static struct curl_slist *no_pragma_header;
 
 static char *base;
 
@@ -102,6 +103,7 @@ static int fetch_index(unsigned char *sh
 	curl_easy_setopt(curl, CURLOPT_FILE, indexfile);
 	curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite);
 	curl_easy_setopt(curl, CURLOPT_URL, url);
+	curl_easy_setopt(curl, CURLOPT_HTTPHEADER, no_pragma_header);
 	
 	if (curl_easy_perform(curl)) {
 		fclose(indexfile);
@@ -152,6 +154,7 @@ static int fetch_indices(void)
 	curl_easy_setopt(curl, CURLOPT_FILE, &buffer);
 	curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite_buffer);
 	curl_easy_setopt(curl, CURLOPT_URL, url);
+	curl_easy_setopt(curl, CURLOPT_HTTPHEADER, NULL);
 	
 	if (curl_easy_perform(curl)) {
 		return error("Unable to get pack index %s", url);
@@ -215,6 +218,7 @@ static int fetch_pack(unsigned char *sha
 	curl_easy_setopt(curl, CURLOPT_FILE, packfile);
 	curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite);
 	curl_easy_setopt(curl, CURLOPT_URL, url);
+	curl_easy_setopt(curl, CURLOPT_HTTPHEADER, no_pragma_header);
 	
 	if (curl_easy_perform(curl)) {
 		fclose(packfile);
@@ -255,6 +259,7 @@ int fetch(unsigned char *sha1)
 	curl_easy_setopt(curl, CURLOPT_FAILONERROR, 1);
 	curl_easy_setopt(curl, CURLOPT_FILE, NULL);
 	curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite_sha1_file);
+	curl_easy_setopt(curl, CURLOPT_HTTPHEADER, no_pragma_header);
 
 	url = xmalloc(strlen(base) + 50);
 	strcpy(url, base);
@@ -303,6 +308,7 @@ int fetch_ref(char *ref, unsigned char *
         
         curl_easy_setopt(curl, CURLOPT_FILE, &buffer);
         curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite_buffer);
+	curl_easy_setopt(curl, CURLOPT_HTTPHEADER, NULL);
 
         url = xmalloc(strlen(base) + 6 + strlen(ref));
         strcpy(url, base);
@@ -354,6 +360,7 @@ int main(int argc, char **argv)
 	curl_global_init(CURL_GLOBAL_ALL);
 
 	curl = curl_easy_init();
+	no_pragma_header = curl_slist_append(no_pragma_header, "Pragma:");
 
 	curl_ssl_verify = getenv("GIT_SSL_NO_VERIFY") ? 0 : 1;
 	curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, curl_ssl_verify);
@@ -366,6 +373,7 @@ int main(int argc, char **argv)
 	if (pull(commit_id))
 		return 1;
 
+	curl_slist_free_all(no_pragma_header);
 	curl_global_cleanup();
 	return 0;
 }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers
  2005-09-13 15:38 [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers Sergey Vlasov
@ 2005-09-13 15:59 ` Junio C Hamano
  2005-09-14 13:12   ` Sergey Vlasov
  2005-09-14 17:17 ` sf
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2005-09-13 15:59 UTC (permalink / raw)
  To: Sergey Vlasov; +Cc: git

Sergey Vlasov <vsu@altlinux.ru> writes:

> This patch removes the "Pragma: no-cache" header from requests for all
> files except the pack list (objects/info/packs) and references
> (refs/*), which are really mutable and should not be cached.

Thanks.  What the patch does looks reasonable.

Do you know if we can use it for any reasonably recent version
of curl?  I seem to recall we already do things slightly
differently depending on LIBCURL_VERSION_NUM.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers
  2005-09-13 15:59 ` Junio C Hamano
@ 2005-09-14 13:12   ` Sergey Vlasov
  2005-09-14 16:28     ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Sergey Vlasov @ 2005-09-14 13:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 847 bytes --]

On Tue, Sep 13, 2005 at 08:59:26AM -0700, Junio C Hamano wrote:
> Sergey Vlasov <vsu@altlinux.ru> writes:
> 
> > This patch removes the "Pragma: no-cache" header from requests for all
> > files except the pack list (objects/info/packs) and references
> > (refs/*), which are really mutable and should not be cached.
> 
> Thanks.  What the patch does looks reasonable.
> 
> Do you know if we can use it for any reasonably recent version
> of curl?  I seem to recall we already do things slightly
> differently depending on LIBCURL_VERSION_NUM.

http://cool.haxx.se/cvs.cgi/curl/include/curl/curl.h?rev=1.1&content-type=text/vnd.viewcvs-markup
shows that CURLOPT_HTTPHEADER, curl_slist_append() and
curl_slist_free_all() were available in Dec 1999, curl 6.3.1.  The FAQ
entry about disabling "Pragma: no-cache" is from Aug 2, 2000.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers
  2005-09-14 13:12   ` Sergey Vlasov
@ 2005-09-14 16:28     ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2005-09-14 16:28 UTC (permalink / raw)
  To: Sergey Vlasov; +Cc: git

Sergey Vlasov <vsu@altlinux.ru> writes:

>> Do you know if we can use it for any reasonably recent version
>> of curl?  I seem to recall we already do things slightly
>> differently depending on LIBCURL_VERSION_NUM.
>
> http://cool.haxx.se/cvs.cgi/curl/include/curl/curl.h?rev=1.1&content-type=text/vnd.viewcvs-markup
> shows...

Cool.  I'll not worry about version dependency for this one,
then.

Thanks for the pointer -- I really appreciate it when people
teach others how to find out what was asked themselves next
time need arises.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers
  2005-09-13 15:38 [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers Sergey Vlasov
  2005-09-13 15:59 ` Junio C Hamano
@ 2005-09-14 17:17 ` sf
  2005-09-19  0:23   ` [PATCH] git-http-fetch: Allow caching of retrieved objects byproxy servers David Lang
  1 sibling, 1 reply; 9+ messages in thread
From: sf @ 2005-09-14 17:17 UTC (permalink / raw)
  To: git

Sergey Vlasov wrote:
> By default the curl library adds "Pragma: no-cache" header to all
> requests, which disables caching by proxy servers.  However, most
> files in a GIT repository are immutable, and caching them is safe and
> could be useful.

Is caching really safe? Because of compression one git object can have 
many file representations.

Regards
	Stephan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects byproxy servers
  2005-09-14 17:17 ` sf
@ 2005-09-19  0:23   ` David Lang
  2005-09-19 10:24     ` sf
  0 siblings, 1 reply; 9+ messages in thread
From: David Lang @ 2005-09-19  0:23 UTC (permalink / raw)
  To: sf-git; +Cc: git

On Wed, 14 Sep 2005, sf wrote:

> Sergey Vlasov wrote:
>> By default the curl library adds "Pragma: no-cache" header to all
>> requests, which disables caching by proxy servers.  However, most
>> files in a GIT repository are immutable, and caching them is safe and
>> could be useful.
>
> Is caching really safe? Because of compression one git object can have many 
> file representations.

only if you use different compression algorithums.

remember that git objects are identified by their sha1, if the sha1 is 
what you want (and the file matches the sha1 after you decompress it) then 
it really doesn't matter what it's on-disk representation is.

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects   byproxy servers
  2005-09-19  0:23   ` [PATCH] git-http-fetch: Allow caching of retrieved objects byproxy servers David Lang
@ 2005-09-19 10:24     ` sf
  2005-09-19 13:35       ` Petr Baudis
  0 siblings, 1 reply; 9+ messages in thread
From: sf @ 2005-09-19 10:24 UTC (permalink / raw)
  To: git

David Lang wrote:
> On Wed, 14 Sep 2005, sf wrote:
> 
>> Sergey Vlasov wrote:
>>
>>> By default the curl library adds "Pragma: no-cache" header to all
>>> requests, which disables caching by proxy servers.  However, most
>>> files in a GIT repository are immutable, and caching them is safe and
>>> could be useful.
>>
>>
>> Is caching really safe? Because of compression one git object can have 
>> many file representations.
> 
> 
> only if you use different compression algorithums.

Even different implementations and different compression levels can lead 
to different file representations.

> remember that git objects are identified by their sha1, if the sha1 is 
> what you want (and the file matches the sha1 after you decompress it) 
> then it really doesn't matter what it's on-disk representation is.

You are arguing on the git tool level but we are talking about HTTP 
which knows nothing about the uncompressed sha1.

The OP assumed that "files in a GIT repository are immutable" which is 
not true. If you consider the sequence

pack -> prune -> update zlib or git -> unpack

you can end up with different files if the new zlib implementation 
changes imcompatibly (with respect to byte-by-byte compression results) 
or if git suddenly does not use the default compression level any more.

And surely in the future there will be other git implementations than 
this one which may not even use zlib.

I do not say that caching is not possible at all but HTTP caching has 
its pitfalls so just be careful.

Regards

	Stephan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects byproxy servers
  2005-09-19 10:24     ` sf
@ 2005-09-19 13:35       ` Petr Baudis
  2005-09-20 11:34         ` sf
  0 siblings, 1 reply; 9+ messages in thread
From: Petr Baudis @ 2005-09-19 13:35 UTC (permalink / raw)
  To: sf; +Cc: git

Dear diary, on Mon, Sep 19, 2005 at 12:24:45PM CEST, I got a letter
where sf <sf@b-i-t.de> told me that...
> >remember that git objects are identified by their sha1, if the sha1 is 
> >what you want (and the file matches the sha1 after you decompress it) 
> >then it really doesn't matter what it's on-disk representation is.
> 
> You are arguing on the git tool level but we are talking about HTTP 
> which knows nothing about the uncompressed sha1.
> 
> The OP assumed that "files in a GIT repository are immutable" which is 
> not true. If you consider the sequence
> 
> pack -> prune -> update zlib or git -> unpack
> 
> you can end up with different files if the new zlib implementation 
> changes imcompatibly (with respect to byte-by-byte compression results) 
> or if git suddenly does not use the default compression level any more.

Yes, but why should this matter? It shouldn't matter if you get the old
"version" or the new version of the file over HTTP, the actual object's
contents is still the same, and GIT shouldn't care.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
If you want the holes in your knowledge showing up try teaching
someone.  -- Alan Cox

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] git-http-fetch: Allow caching of retrieved objects byproxy servers
  2005-09-19 13:35       ` Petr Baudis
@ 2005-09-20 11:34         ` sf
  0 siblings, 0 replies; 9+ messages in thread
From: sf @ 2005-09-20 11:34 UTC (permalink / raw)
  To: git

Petr Baudis wrote:
> Dear diary, on Mon, Sep 19, 2005 at 12:24:45PM CEST, I got a letter
> where sf <sf@b-i-t.de> told me that...
...
>> The OP assumed that "files in a GIT repository are immutable" which is 
>> not true. If you consider the sequence
>> 
>> pack -> prune -> update zlib or git -> unpack
>> 
>> you can end up with different files if the new zlib implementation 
>> changes imcompatibly (with respect to byte-by-byte compression results) 
>> or if git suddenly does not use the default compression level any more.
> 
> Yes, but why should this matter? It shouldn't matter if you get the old
> "version" or the new version of the file over HTTP, the actual object's
> contents is still the same, and GIT shouldn't care.
> 

This is correct as long as you take care to always get each file in one go.

Recently there was talk about how git handles objects larger than 4GB. 
But you do not have to go this far. Think about fetching 1MB (or 10MB or 
100MB) compressed objects over a slow link. If the transfer gets 
interrupted some people or some clever piece of software - perhaps even 
in git-core - might try to continue the interrupted download. Now if the 
file representation has changed in the meantime the downloaded file is 
going to be corrupt.

The git tools will of course take note of the corruption but then the 
head scratching begins: "What went wrong?"

The more I think about this I realize that my worries have nothing to do 
with caching but with HTTP fetching in general.

Regards

	Stephan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-09-20 11:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-13 15:38 [PATCH] git-http-fetch: Allow caching of retrieved objects by proxy servers Sergey Vlasov
2005-09-13 15:59 ` Junio C Hamano
2005-09-14 13:12   ` Sergey Vlasov
2005-09-14 16:28     ` Junio C Hamano
2005-09-14 17:17 ` sf
2005-09-19  0:23   ` [PATCH] git-http-fetch: Allow caching of retrieved objects byproxy servers David Lang
2005-09-19 10:24     ` sf
2005-09-19 13:35       ` Petr Baudis
2005-09-20 11:34         ` sf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).