git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: "John 'Warthog9' Hawley" <warthog9@eaglescrag.net>
Cc: git@vger.kernel.org, Jakub Narebski <jnareb@gmail.com>
Subject: Re: [PATCH 11/18] gitweb: add isDumbClient() check
Date: Thu, 09 Dec 2010 16:12:33 -0800 (PST)	[thread overview]
Message-ID: <m3hbem1o7a.fsf@localhost.localdomain> (raw)
In-Reply-To: <1291931844-28454-12-git-send-email-warthog9@eaglescrag.net>

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> Basic check for the claimed Agent string, if it matches a known
> blacklist (wget and curl currently) don't display the 'Generating...'
> page.
> 
> Jakub has mentioned a couple of other possible ways to handle
> this, so if a better way comes along this should be used as a
> wrapper to any better way we can find to deal with this.
> 
> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
> ---
>  gitweb/lib/cache.pl |   30 ++++++++++++++++++++++++++++++
>  1 files changed, 30 insertions(+), 0 deletions(-)
> 
> diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
> index d55b572..5182a94 100644
> --- a/gitweb/lib/cache.pl
> +++ b/gitweb/lib/cache.pl
> @@ -116,6 +116,34 @@ sub isFeedAction {
>  	return 0;		# False
>  }
>  
> +# There have been a number of requests that things like "dumb" clients, I.E. wget
> +# lynx, links, etc (things that just download, but don't parse the html) actually
> +# work without getting the wonkiness that is the "Generating..." page.
> +#
> +# There's only one good way to deal with this, and that's to read the browser User
> +# Agent string and do matching based on that.  This has a whole slew of error cases
> +# and mess, but there's no other way to determine if the "Generating..." page
> +# will break things.
> +#
> +# This assumes the client is not dumb, thus the default behavior is to return
> +# "false" (0) (and eventually the "Generating..." page).  If it is a dumb client
> +# return "true" (1)
> +sub isDumbClient {

Please don't use mixedCase, but underline_separated words,
e.g. browser_is_robot(), or client_is_dumb().

> +	my($user_agent) = $ENV{'HTTP_USER_AGENT'};

What if $ENV{'HTTP_USER_AGENT'} is unset / undef, e.g. because we are
runing gitweb as a script... which includes running gitweb tests?

> +	
> +	if(
> +		# wget case
> +		$user_agent =~ /^Wget/i
> +		||
> +		# curl should be excluded I think, probably better safe than sorry
> +		$user_agent =~ /^curl/i
> +	  ){
> +		return 1;	# True
> +	}
> +
> +	return 0;
> +}

Compare (note: handcrafted solution is to whitelist, not blacklist):

+sub browser_is_robot {
+       return 1 if !exists $ENV{'HTTP_USER_AGENT'}; # gitweb run as script
+       if (eval { require HTTP::BrowserDetect; }) {
+               my $browser = HTTP::BrowserDetect->new();
+               return $browser->robot();
+       }
+       # fallback on detecting known web browsers
+       return 0 if ($ENV{'HTTP_USER_AGENT'} =~ /\b(?:Mozilla|Opera|Safari|IE)\b/);
+       # be conservative; if not sure, assume non-interactive
+       return 1;
+}

from

  "[PATCHv6 17/24] gitweb: Show appropriate "Generating..." page when regenerating cache"
  http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163040
  http://repo.or.cz/w/git/jnareb-git.git/commitdiff/48679f7985ccda16dc54fda97790841bab4a0ba2

-- 
Jakub Narebski
Poland
ShadeHawk on #git

  reply	other threads:[~2010-12-10  0:12 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 01/18] gitweb: Prepare for splitting gitweb John 'Warthog9' Hawley
2010-12-09 23:30   ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 02/18] gitweb: add output buffering and associated functions John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 03/18] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 04/18] gitweb: Minimal testing of gitweb caching John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 05/18] gitweb: Regression fix concerning binary output of files John 'Warthog9' Hawley
2010-12-09 23:33   ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 06/18] gitweb: Add more explicit means of disabling 'Generating...' page John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 07/18] gitweb: Revert back to $cache_enable vs. $caching_enabled John 'Warthog9' Hawley
2010-12-09 23:38   ` Jakub Narebski
2010-12-10  2:38     ` J.H.
2010-12-10 13:48       ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 08/18] gitweb: Change is_cacheable() to return true always John 'Warthog9' Hawley
2010-12-09 23:46   ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 09/18] gitweb: Revert reset_output() back to original code John 'Warthog9' Hawley
2010-12-09 23:58   ` Jakub Narebski
2010-12-10  2:43     ` J.H.
2010-12-09 21:57 ` [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type John 'Warthog9' Hawley
2010-12-10  0:06   ` Jakub Narebski
2010-12-10  3:39     ` J.H.
2010-12-10 12:10       ` Jakub Narebski
2010-12-10 12:25         ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 11/18] gitweb: add isDumbClient() check John 'Warthog9' Hawley
2010-12-10  0:12   ` Jakub Narebski [this message]
2010-12-10  4:00     ` J.H.
2010-12-11  0:07       ` Junio C Hamano
2010-12-11  0:15         ` Jakub Narebski
2010-12-11  1:15           ` J.H.
2010-12-11  1:40             ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 12/18] gitweb: Change file handles (in caching) to lexical variables as opposed to globs John 'Warthog9' Hawley
2010-12-10  0:16   ` Jakub Narebski
2010-12-10  0:32     ` Junio C Hamano
2010-12-10  0:47       ` Jakub Narebski
2010-12-10  5:56       ` J.H.
2010-12-09 21:57 ` [PATCH 13/18] gitweb: Add commented url & url hash to page footer John 'Warthog9' Hawley
2010-12-10  0:26   ` Jakub Narebski
2010-12-10  6:10     ` J.H.
2010-12-09 21:57 ` [PATCH 14/18] gitweb: add print_transient_header() function for central header printing John 'Warthog9' Hawley
2010-12-10  0:36   ` Jakub Narebski
2010-12-10  6:18     ` J.H.
2010-12-09 21:57 ` [PATCH 15/18] gitweb: Add show_warning() to display an immediate warning, with refresh John 'Warthog9' Hawley
2010-12-10  1:01   ` Jakub Narebski
2010-12-10  7:38     ` J.H.
2010-12-10 14:10       ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 16/18] gitweb: When changing output (STDOUT) change STDERR as well John 'Warthog9' Hawley
2010-12-10  1:36   ` Jakub Narebski
2010-12-12  5:25     ` J.H.
2010-12-12 15:17       ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 17/18] gitweb: Prepare for cached error pages & better error page handling John 'Warthog9' Hawley
2010-12-10  1:49   ` Jakub Narebski
2010-12-10  8:33     ` J.H.
2010-12-10 20:33       ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 18/18] gitweb: Add better error handling for gitweb caching John 'Warthog9' Hawley
2010-12-10  1:56   ` Jakub Narebski
2010-12-09 23:26 ` [PATCH 00/18] Gitweb caching v8 Jakub Narebski
2010-12-10  0:43   ` J.H.
2010-12-10  1:27     ` Jakub Narebski
2010-12-10  0:39 ` Junio C Hamano
2010-12-10  0:45   ` J.H.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3hbem1o7a.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=warthog9@eaglescrag.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).