From: Jakub Narebski <jnareb@gmail.com>
To: "John 'Warthog9' Hawley" <warthog9@eaglescrag.net>
Cc: git@vger.kernel.org, Jakub Narebski <jnareb@gmail.com>
Subject: Re: [PATCH 11/18] gitweb: add isDumbClient() check
Date: Thu, 09 Dec 2010 16:12:33 -0800 (PST) [thread overview]
Message-ID: <m3hbem1o7a.fsf@localhost.localdomain> (raw)
In-Reply-To: <1291931844-28454-12-git-send-email-warthog9@eaglescrag.net>
"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:
> Basic check for the claimed Agent string, if it matches a known
> blacklist (wget and curl currently) don't display the 'Generating...'
> page.
>
> Jakub has mentioned a couple of other possible ways to handle
> this, so if a better way comes along this should be used as a
> wrapper to any better way we can find to deal with this.
>
> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
> ---
> gitweb/lib/cache.pl | 30 ++++++++++++++++++++++++++++++
> 1 files changed, 30 insertions(+), 0 deletions(-)
>
> diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
> index d55b572..5182a94 100644
> --- a/gitweb/lib/cache.pl
> +++ b/gitweb/lib/cache.pl
> @@ -116,6 +116,34 @@ sub isFeedAction {
> return 0; # False
> }
>
> +# There have been a number of requests that things like "dumb" clients, I.E. wget
> +# lynx, links, etc (things that just download, but don't parse the html) actually
> +# work without getting the wonkiness that is the "Generating..." page.
> +#
> +# There's only one good way to deal with this, and that's to read the browser User
> +# Agent string and do matching based on that. This has a whole slew of error cases
> +# and mess, but there's no other way to determine if the "Generating..." page
> +# will break things.
> +#
> +# This assumes the client is not dumb, thus the default behavior is to return
> +# "false" (0) (and eventually the "Generating..." page). If it is a dumb client
> +# return "true" (1)
> +sub isDumbClient {
Please don't use mixedCase, but underline_separated words,
e.g. browser_is_robot(), or client_is_dumb().
> + my($user_agent) = $ENV{'HTTP_USER_AGENT'};
What if $ENV{'HTTP_USER_AGENT'} is unset / undef, e.g. because we are
runing gitweb as a script... which includes running gitweb tests?
> +
> + if(
> + # wget case
> + $user_agent =~ /^Wget/i
> + ||
> + # curl should be excluded I think, probably better safe than sorry
> + $user_agent =~ /^curl/i
> + ){
> + return 1; # True
> + }
> +
> + return 0;
> +}
Compare (note: handcrafted solution is to whitelist, not blacklist):
+sub browser_is_robot {
+ return 1 if !exists $ENV{'HTTP_USER_AGENT'}; # gitweb run as script
+ if (eval { require HTTP::BrowserDetect; }) {
+ my $browser = HTTP::BrowserDetect->new();
+ return $browser->robot();
+ }
+ # fallback on detecting known web browsers
+ return 0 if ($ENV{'HTTP_USER_AGENT'} =~ /\b(?:Mozilla|Opera|Safari|IE)\b/);
+ # be conservative; if not sure, assume non-interactive
+ return 1;
+}
from
"[PATCHv6 17/24] gitweb: Show appropriate "Generating..." page when regenerating cache"
http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163040
http://repo.or.cz/w/git/jnareb-git.git/commitdiff/48679f7985ccda16dc54fda97790841bab4a0ba2
--
Jakub Narebski
Poland
ShadeHawk on #git
next prev parent reply other threads:[~2010-12-10 0:12 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 01/18] gitweb: Prepare for splitting gitweb John 'Warthog9' Hawley
2010-12-09 23:30 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 02/18] gitweb: add output buffering and associated functions John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 03/18] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 04/18] gitweb: Minimal testing of gitweb caching John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 05/18] gitweb: Regression fix concerning binary output of files John 'Warthog9' Hawley
2010-12-09 23:33 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 06/18] gitweb: Add more explicit means of disabling 'Generating...' page John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 07/18] gitweb: Revert back to $cache_enable vs. $caching_enabled John 'Warthog9' Hawley
2010-12-09 23:38 ` Jakub Narebski
2010-12-10 2:38 ` J.H.
2010-12-10 13:48 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 08/18] gitweb: Change is_cacheable() to return true always John 'Warthog9' Hawley
2010-12-09 23:46 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 09/18] gitweb: Revert reset_output() back to original code John 'Warthog9' Hawley
2010-12-09 23:58 ` Jakub Narebski
2010-12-10 2:43 ` J.H.
2010-12-09 21:57 ` [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type John 'Warthog9' Hawley
2010-12-10 0:06 ` Jakub Narebski
2010-12-10 3:39 ` J.H.
2010-12-10 12:10 ` Jakub Narebski
2010-12-10 12:25 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 11/18] gitweb: add isDumbClient() check John 'Warthog9' Hawley
2010-12-10 0:12 ` Jakub Narebski [this message]
2010-12-10 4:00 ` J.H.
2010-12-11 0:07 ` Junio C Hamano
2010-12-11 0:15 ` Jakub Narebski
2010-12-11 1:15 ` J.H.
2010-12-11 1:40 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 12/18] gitweb: Change file handles (in caching) to lexical variables as opposed to globs John 'Warthog9' Hawley
2010-12-10 0:16 ` Jakub Narebski
2010-12-10 0:32 ` Junio C Hamano
2010-12-10 0:47 ` Jakub Narebski
2010-12-10 5:56 ` J.H.
2010-12-09 21:57 ` [PATCH 13/18] gitweb: Add commented url & url hash to page footer John 'Warthog9' Hawley
2010-12-10 0:26 ` Jakub Narebski
2010-12-10 6:10 ` J.H.
2010-12-09 21:57 ` [PATCH 14/18] gitweb: add print_transient_header() function for central header printing John 'Warthog9' Hawley
2010-12-10 0:36 ` Jakub Narebski
2010-12-10 6:18 ` J.H.
2010-12-09 21:57 ` [PATCH 15/18] gitweb: Add show_warning() to display an immediate warning, with refresh John 'Warthog9' Hawley
2010-12-10 1:01 ` Jakub Narebski
2010-12-10 7:38 ` J.H.
2010-12-10 14:10 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 16/18] gitweb: When changing output (STDOUT) change STDERR as well John 'Warthog9' Hawley
2010-12-10 1:36 ` Jakub Narebski
2010-12-12 5:25 ` J.H.
2010-12-12 15:17 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 17/18] gitweb: Prepare for cached error pages & better error page handling John 'Warthog9' Hawley
2010-12-10 1:49 ` Jakub Narebski
2010-12-10 8:33 ` J.H.
2010-12-10 20:33 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 18/18] gitweb: Add better error handling for gitweb caching John 'Warthog9' Hawley
2010-12-10 1:56 ` Jakub Narebski
2010-12-09 23:26 ` [PATCH 00/18] Gitweb caching v8 Jakub Narebski
2010-12-10 0:43 ` J.H.
2010-12-10 1:27 ` Jakub Narebski
2010-12-10 0:39 ` Junio C Hamano
2010-12-10 0:45 ` J.H.
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m3hbem1o7a.fsf@localhost.localdomain \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=warthog9@eaglescrag.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.