From: Nick Hengeveld <nickh@reactrix.com>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: Cloning from sites with 404 overridden
Date: Thu, 23 Mar 2006 10:43:51 -0800 [thread overview]
Message-ID: <20060323184351.GA3892@reactrix.com> (raw)
In-Reply-To: <7vacbi8eu1.fsf@assigned-by-dhcp.cox.net>
On Wed, Mar 22, 2006 at 11:22:14AM -0800, Junio C Hamano wrote:
> You probably need only one bit here,...
> ... and note if that is an HTML document or not.
/me smacks self...
> However the patch would not help when such a server also did a
> "Sorry, did you mistype the URL?" HTML response, and I was
> wondering how typical that would be.
Seems like there are three cases to worry about:
1) the server returns a 200 status and a text/html response instead of a
404, and the server's default content type is not text/html
2) the server returns a 200 status and a text/html response instead of a
404, and the server's default content type is text/html
3) the server returns a corrupt object from the repository
I don't think there's a way to distinguish between #2 and #3, so all we
can really do is display as helpful an error message as possible.
We can detect #1 if there has been a previous successful loose object
transfer by tracking whether the repo's default content type is
text/html. In such a case should http-fetch behave as if the server
returned 404? If there have been no successful loose object transfers,
we'd have to respond as with #2. This approach could potentially break
if requests are load-balanced to servers with different
misconfigurations - but I think trying to detect that is bending
backwards a little too far.
On a related note, I noticed that http-fetch will continue to try
inflating/sha1_updating the response after an inflate error has been
detected. It's probably not a huge deal, but we could just error out
immediately at that point or at least stop the unnecessary processing.
Something like this? Tested by cloning
http://digilander.libero.it/mcostalba/scm/qgit.git
[PATCH] http-fetch: try to detect 404s from misconfigured servers
Some HTTP server environments return a 200 status and text/html error
document or a redirect to one rather than a 404 status if a loose
object does not exist. This patch tries to detect such a response
and treat it as a 404.
Signed-off-by: Nick Hengeveld <nickh@reactrix.com>
---
http-fetch.c | 24 ++++++++++++++++++++++--
1 files changed, 22 insertions(+), 2 deletions(-)
ab97429c5b0a4b4466ee0072f75706399e42b675
diff --git a/http-fetch.c b/http-fetch.c
index dc67218..bb75050 100644
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -16,6 +16,7 @@ struct alt_base
{
char *base;
int got_indices;
+ int default_html_content_type;
struct packed_git *packs;
struct alt_base *next;
};
@@ -41,6 +42,7 @@ struct object_request
CURLcode curl_result;
char errorstr[CURL_ERROR_SIZE];
long http_code;
+ char html_content_type;
unsigned char real_sha1[20];
SHA_CTX c;
z_stream stream;
@@ -249,6 +251,9 @@ static void finish_object_request(struct
unlink(obj_req->tmpfile);
return;
}
+ if (obj_req->repo->default_html_content_type == -1)
+ obj_req->repo->default_html_content_type =
+ obj_req->html_content_type;
obj_req->rename =
move_temp_to_file(obj_req->tmpfile, obj_req->filename);
@@ -258,9 +263,15 @@ static void finish_object_request(struct
static void process_object_response(void *callback_data)
{
+ char *content_type;
struct object_request *obj_req =
(struct object_request *)callback_data;
+ curl_easy_getinfo(obj_req->slot->curl, CURLINFO_CONTENT_TYPE,
+ &content_type);
+ if (content_type && !strcmp(content_type, "text/html"))
+ obj_req->html_content_type = 1;
+
obj_req->curl_result = obj_req->slot->curl_result;
obj_req->http_code = obj_req->slot->http_code;
obj_req->slot = NULL;
@@ -340,6 +351,7 @@ void prefetch(unsigned char *sha1)
memcpy(newreq->sha1, sha1, 20);
newreq->repo = alt;
newreq->url = NULL;
+ newreq->html_content_type = 0;
newreq->local = -1;
newreq->state = WAITING;
snprintf(newreq->filename, sizeof(newreq->filename), "%s", filename);
@@ -539,6 +551,7 @@ static void process_alternates_response(
newalt->next = NULL;
newalt->base = target;
newalt->got_indices = 0;
+ newalt->default_html_content_type = -1;
newalt->packs = NULL;
while (tail->next != NULL)
tail = tail->next;
@@ -835,8 +848,14 @@ static int fetch_object(struct alt_base
obj_req->errorstr, obj_req->curl_result,
obj_req->http_code, hex);
} else if (obj_req->zret != Z_STREAM_END) {
- corrupt_object_found++;
- ret = error("File %s (%s) corrupt", hex, obj_req->url);
+ if (obj_req->html_content_type &&
+ !obj_req->repo->default_html_content_type)
+ ret = -1; /* Be silent, looks like a 404 */
+ else {
+ corrupt_object_found++;
+ ret = error("File %s (%s) corrupt",
+ sha1_to_hex(obj_req->sha1), obj_req->url);
+ }
} else if (memcmp(obj_req->sha1, obj_req->real_sha1, 20)) {
ret = error("File %s has bad hash", hex);
} else if (obj_req->rename < 0) {
@@ -985,6 +1004,7 @@ int main(int argc, char **argv)
alt = xmalloc(sizeof(*alt));
alt->base = url;
alt->got_indices = 0;
+ alt->default_html_content_type = -1;
alt->packs = NULL;
alt->next = NULL;
--
1.2.4.gb1bc1d-dirty
next prev parent reply other threads:[~2006-03-23 18:44 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-22 2:59 Cloning from sites with 404 overridden linux
2006-03-22 3:12 ` Shawn Pearce
2006-03-22 4:13 ` Linus Torvalds
2006-03-22 6:06 ` Marco Costalba
2006-03-22 6:47 ` Junio C Hamano
2006-03-22 13:36 ` Andreas Ericsson
2006-03-24 17:29 ` Mark Wooding
2006-03-24 17:52 ` Junio C Hamano
2006-03-24 17:53 ` Linus Torvalds
2006-03-24 18:16 ` Morten Welinder
2006-03-24 18:40 ` Andreas Ericsson
2006-03-22 17:22 ` Nick Hengeveld
2006-03-22 18:36 ` Nick Hengeveld
2006-03-22 19:05 ` Junio C Hamano
2006-03-22 19:22 ` Junio C Hamano
2006-03-23 18:43 ` Nick Hengeveld [this message]
2006-03-23 20:45 ` Junio C Hamano
2006-03-22 21:24 ` Radoslaw Szkodzinski
-- strict thread matches above, loose matches on Subject: below --
2006-03-19 10:52 Marco Costalba
2006-03-19 13:25 ` Paolo Ciarrocchi
2006-03-19 14:04 ` Marco Costalba
2006-03-19 19:37 ` Junio C Hamano
2006-03-19 21:40 ` Marco Costalba
2006-03-19 23:21 ` Junio C Hamano
2006-03-20 6:31 ` Marco Costalba
2006-03-20 8:44 ` Junio C Hamano
2006-03-20 12:17 ` Marco Costalba
2006-03-20 18:29 ` Lukas Sandström
2006-03-20 19:43 ` Petr Baudis
2006-03-20 19:54 ` Nick Hengeveld
2006-03-19 19:47 ` Junio C Hamano
2006-03-19 21:31 ` Petr Baudis
2006-03-19 21:43 ` Petr Baudis
2006-03-19 21:45 ` Marco Costalba
2006-03-20 4:32 ` Randal L. Schwartz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060323184351.GA3892@reactrix.com \
--to=nickh@reactrix.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox