All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shawn Pearce <spearce@spearce.org>
To: linux@horizon.com
Cc: git@vger.kernel.org
Subject: Re: Cloning from sites with 404 overridden
Date: Tue, 21 Mar 2006 22:12:00 -0500	[thread overview]
Message-ID: <20060322031200.GB17954@spearce.org> (raw)
In-Reply-To: <20060322025921.1722.qmail@science.horizon.com>

'0' x 40.  :-) There's some places already in the GIT source
which would have ``issues'' if they got an object with this hash.
Not sure if it is actually an entirely impossible hash or just one
that is highly improbable.

My own website has this problem and its because I'm using WordPress
to handle all URLs on the site; I haven't yet found a way to
configure WordPress to return a proper 404 when the URL can't be
mapped to something on the server.  Note that 404 status codes can
in fact return pretty HTML content for the user, and many websites
do this and many browsers display that pretty HTML.  But a bot can
then also recognize the status code and DTRT.

The webservers are just plain broken, mine included.  I think the
best option is to delay corrupt object reporting to the end of
the download process if you get only one corrupt object and that
corrupt object was actually attainable from a pack.  And in this
case its just a minor warning:

	Warning: The server appears to not return proper HTTP status
	codes on missing files.  The files were found in one or
	more packs so the download is OK, but the server administrator
	should really fix their server.  If you know the server
	administrator you might want to prod them to do so.

But that's already been suggested and I thought someone worked up
a patch based on that idea?  If not I could try to do so since my
own damn server has the problem.  :-)

linux@horizon.com wrote:
> If someone feels ambitious, you can detect this condition automatically
> by searching for a file that you know won't be there and seeing if you
> get a 404 response to that.
> 
> To avoid punishing good servers, it would be nice to defer the test
> until reciving the first corrupted object.
> 
> I'm not sure what the best "object that's not supposed to be there" is.
> It could just be a random hash, or would a malformed object file name
> be better?  Any fixed name has a finite chance of being created by
> someone somewhere, but generating 160-bit random numbers is a PITA on
> non-freenix platforms.
> 
> 
> (As an aside, I suspect this is all caused by Microsoft's "friendly HTML
> error messages" invention.)

-- 
Shawn.

  reply	other threads:[~2006-03-22  3:12 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-22  2:59 Cloning from sites with 404 overridden linux
2006-03-22  3:12 ` Shawn Pearce [this message]
2006-03-22  4:13   ` Linus Torvalds
2006-03-22  6:06 ` Marco Costalba
2006-03-22  6:47   ` Junio C Hamano
2006-03-22 13:36 ` Andreas Ericsson
2006-03-24 17:29   ` Mark Wooding
2006-03-24 17:52     ` Junio C Hamano
2006-03-24 17:53     ` Linus Torvalds
2006-03-24 18:16     ` Morten Welinder
2006-03-24 18:40     ` Andreas Ericsson
2006-03-22 17:22 ` Nick Hengeveld
2006-03-22 18:36   ` Nick Hengeveld
2006-03-22 19:05     ` Junio C Hamano
2006-03-22 19:22       ` Junio C Hamano
2006-03-23 18:43         ` Nick Hengeveld
2006-03-23 20:45           ` Junio C Hamano
2006-03-22 21:24       ` Radoslaw Szkodzinski
  -- strict thread matches above, loose matches on Subject: below --
2006-03-19 10:52 Marco Costalba
2006-03-19 13:25 ` Paolo Ciarrocchi
2006-03-19 14:04   ` Marco Costalba
2006-03-19 19:37     ` Junio C Hamano
2006-03-19 21:40       ` Marco Costalba
2006-03-19 23:21         ` Junio C Hamano
2006-03-20  6:31           ` Marco Costalba
2006-03-20  8:44             ` Junio C Hamano
2006-03-20 12:17               ` Marco Costalba
2006-03-20 18:29       ` Lukas Sandström
2006-03-20 19:43         ` Petr Baudis
2006-03-20 19:54         ` Nick Hengeveld
2006-03-19 19:47     ` Junio C Hamano
2006-03-19 21:31       ` Petr Baudis
2006-03-19 21:43         ` Petr Baudis
2006-03-19 21:45         ` Marco Costalba
2006-03-20  4:32       ` Randal L. Schwartz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060322031200.GB17954@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=linux@horizon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.