From: Nick Hengeveld <nickh@reactrix.com>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: Cloning from sites with 404 overridden
Date: Thu, 23 Mar 2006 10:43:51 -0800 [thread overview]
Message-ID: <20060323184351.GA3892@reactrix.com> (raw)
In-Reply-To: <7vacbi8eu1.fsf@assigned-by-dhcp.cox.net>
On Wed, Mar 22, 2006 at 11:22:14AM -0800, Junio C Hamano wrote:
> You probably need only one bit here,...
> ... and note if that is an HTML document or not.
/me smacks self...
> However the patch would not help when such a server also did a
> "Sorry, did you mistype the URL?" HTML response, and I was
> wondering how typical that would be.
Seems like there are three cases to worry about:
1) the server returns a 200 status and a text/html response instead of a
404, and the server's default content type is not text/html
2) the server returns a 200 status and a text/html response instead of a
404, and the server's default content type is text/html
3) the server returns a corrupt object from the repository
I don't think there's a way to distinguish between #2 and #3, so all we
can really do is display as helpful an error message as possible.
We can detect #1 if there has been a previous successful loose object
transfer by tracking whether the repo's default content type is
text/html. In such a case should http-fetch behave as if the server
returned 404? If there have been no successful loose object transfers,
we'd have to respond as with #2. This approach could potentially break
if requests are load-balanced to servers with different
misconfigurations - but I think trying to detect that is bending
backwards a little too far.
On a related note, I noticed that http-fetch will continue to try
inflating/sha1_updating the response after an inflate error has been
detected. It's probably not a huge deal, but we could just error out
immediately at that point or at least stop the unnecessary processing.
Something like this? Tested by cloning
http://digilander.libero.it/mcostalba/scm/qgit.git
[PATCH] http-fetch: try to detect 404s from misconfigured servers
Some HTTP server environments return a 200 status and text/html error
document or a redirect to one rather than a 404 status if a loose
object does not exist. This patch tries to detect such a response
and treat it as a 404.
Signed-off-by: Nick Hengeveld <nickh@reactrix.com>
---
http-fetch.c | 24 ++++++++++++++++++++++--
1 files changed, 22 insertions(+), 2 deletions(-)
ab97429c5b0a4b4466ee0072f75706399e42b675
diff --git a/http-fetch.c b/http-fetch.c
index dc67218..bb75050 100644
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -16,6 +16,7 @@ struct alt_base
{
char *base;
int got_indices;
+ int default_html_content_type;
struct packed_git *packs;
struct alt_base *next;
};
@@ -41,6 +42,7 @@ struct object_request
CURLcode curl_result;
char errorstr[CURL_ERROR_SIZE];
long http_code;
+ char html_content_type;
unsigned char real_sha1[20];
SHA_CTX c;
z_stream stream;
@@ -249,6 +251,9 @@ static void finish_object_request(struct
unlink(obj_req->tmpfile);
return;
}
+ if (obj_req->repo->default_html_content_type == -1)
+ obj_req->repo->default_html_content_type =
+ obj_req->html_content_type;
obj_req->rename =
move_temp_to_file(obj_req->tmpfile, obj_req->filename);
@@ -258,9 +263,15 @@ static void finish_object_request(struct
static void process_object_response(void *callback_data)
{
+ char *content_type;
struct object_request *obj_req =
(struct object_request *)callback_data;
+ curl_easy_getinfo(obj_req->slot->curl, CURLINFO_CONTENT_TYPE,
+ &content_type);
+ if (content_type && !strcmp(content_type, "text/html"))
+ obj_req->html_content_type = 1;
+
obj_req->curl_result = obj_req->slot->curl_result;
obj_req->http_code = obj_req->slot->http_code;
obj_req->slot = NULL;
@@ -340,6 +351,7 @@ void prefetch(unsigned char *sha1)
memcpy(newreq->sha1, sha1, 20);
newreq->repo = alt;
newreq->url = NULL;
+ newreq->html_content_type = 0;
newreq->local = -1;
newreq->state = WAITING;
snprintf(newreq->filename, sizeof(newreq->filename), "%s", filename);
@@ -539,6 +551,7 @@ static void process_alternates_response(
newalt->next = NULL;
newalt->base = target;
newalt->got_indices = 0;
+ newalt->default_html_content_type = -1;
newalt->packs = NULL;
while (tail->next != NULL)
tail = tail->next;
@@ -835,8 +848,14 @@ static int fetch_object(struct alt_base
obj_req->errorstr, obj_req->curl_result,
obj_req->http_code, hex);
} else if (obj_req->zret != Z_STREAM_END) {
- corrupt_object_found++;
- ret = error("File %s (%s) corrupt", hex, obj_req->url);
+ if (obj_req->html_content_type &&
+ !obj_req->repo->default_html_content_type)
+ ret = -1; /* Be silent, looks like a 404 */
+ else {
+ corrupt_object_found++;
+ ret = error("File %s (%s) corrupt",
+ sha1_to_hex(obj_req->sha1), obj_req->url);
+ }
} else if (memcmp(obj_req->sha1, obj_req->real_sha1, 20)) {
ret = error("File %s has bad hash", hex);
} else if (obj_req->rename < 0) {
@@ -985,6 +1004,7 @@ int main(int argc, char **argv)
alt = xmalloc(sizeof(*alt));
alt->base = url;
alt->got_indices = 0;
+ alt->default_html_content_type = -1;
alt->packs = NULL;
alt->next = NULL;
--
1.2.4.gb1bc1d-dirty
next prev parent reply other threads:[~2006-03-23 18:44 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-22 2:59 Cloning from sites with 404 overridden linux
2006-03-22 3:12 ` Shawn Pearce
2006-03-22 4:13 ` Linus Torvalds
2006-03-22 6:06 ` Marco Costalba
2006-03-22 6:47 ` Junio C Hamano
2006-03-22 13:36 ` Andreas Ericsson
2006-03-24 17:29 ` Mark Wooding
2006-03-24 17:52 ` Junio C Hamano
2006-03-24 17:53 ` Linus Torvalds
2006-03-24 18:16 ` Morten Welinder
2006-03-24 18:40 ` Andreas Ericsson
2006-03-22 17:22 ` Nick Hengeveld
2006-03-22 18:36 ` Nick Hengeveld
2006-03-22 19:05 ` Junio C Hamano
2006-03-22 19:22 ` Junio C Hamano
2006-03-23 18:43 ` Nick Hengeveld [this message]
2006-03-23 20:45 ` Junio C Hamano
2006-03-22 21:24 ` Radoslaw Szkodzinski
-- strict thread matches above, loose matches on Subject: below --
2006-03-19 10:52 Marco Costalba
2006-03-19 13:25 ` Paolo Ciarrocchi
2006-03-19 14:04 ` Marco Costalba
2006-03-19 19:37 ` Junio C Hamano
2006-03-19 21:40 ` Marco Costalba
2006-03-19 23:21 ` Junio C Hamano
2006-03-20 6:31 ` Marco Costalba
2006-03-20 8:44 ` Junio C Hamano
2006-03-20 12:17 ` Marco Costalba
2006-03-20 18:29 ` Lukas Sandström
2006-03-20 19:43 ` Petr Baudis
2006-03-20 19:54 ` Nick Hengeveld
2006-03-19 19:47 ` Junio C Hamano
2006-03-19 21:31 ` Petr Baudis
2006-03-19 21:43 ` Petr Baudis
2006-03-19 21:45 ` Marco Costalba
2006-03-20 4:32 ` Randal L. Schwartz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060323184351.GA3892@reactrix.com \
--to=nickh@reactrix.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.