From: Jakub Narebski <jnareb@gmail.com>
To: "John 'Warthog9' Hawley" <warthog9@eaglescrag.net>
Cc: git@vger.kernel.org, Jakub Narebski <jnareb@gmail.com>
Subject: Re: [PATCH 11/18] gitweb: add isDumbClient() check
Date: Thu, 09 Dec 2010 16:12:33 -0800 (PST) [thread overview]
Message-ID: <m3hbem1o7a.fsf@localhost.localdomain> (raw)
In-Reply-To: <1291931844-28454-12-git-send-email-warthog9@eaglescrag.net>
"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:
> Basic check for the claimed Agent string, if it matches a known
> blacklist (wget and curl currently) don't display the 'Generating...'
> page.
>
> Jakub has mentioned a couple of other possible ways to handle
> this, so if a better way comes along this should be used as a
> wrapper to any better way we can find to deal with this.
>
> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
> ---
> gitweb/lib/cache.pl | 30 ++++++++++++++++++++++++++++++
> 1 files changed, 30 insertions(+), 0 deletions(-)
>
> diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
> index d55b572..5182a94 100644
> --- a/gitweb/lib/cache.pl
> +++ b/gitweb/lib/cache.pl
> @@ -116,6 +116,34 @@ sub isFeedAction {
> return 0; # False
> }
>
> +# There have been a number of requests that things like "dumb" clients, I.E. wget
> +# lynx, links, etc (things that just download, but don't parse the html) actually
> +# work without getting the wonkiness that is the "Generating..." page.
> +#
> +# There's only one good way to deal with this, and that's to read the browser User
> +# Agent string and do matching based on that. This has a whole slew of error cases
> +# and mess, but there's no other way to determine if the "Generating..." page
> +# will break things.
> +#
> +# This assumes the client is not dumb, thus the default behavior is to return
> +# "false" (0) (and eventually the "Generating..." page). If it is a dumb client
> +# return "true" (1)
> +sub isDumbClient {
Please don't use mixedCase, but underline_separated words,
e.g. browser_is_robot(), or client_is_dumb().
> + my($user_agent) = $ENV{'HTTP_USER_AGENT'};
What if $ENV{'HTTP_USER_AGENT'} is unset / undef, e.g. because we are
runing gitweb as a script... which includes running gitweb tests?
> +
> + if(
> + # wget case
> + $user_agent =~ /^Wget/i
> + ||
> + # curl should be excluded I think, probably better safe than sorry
> + $user_agent =~ /^curl/i
> + ){
> + return 1; # True
> + }
> +
> + return 0;
> +}
Compare (note: handcrafted solution is to whitelist, not blacklist):
+sub browser_is_robot {
+ return 1 if !exists $ENV{'HTTP_USER_AGENT'}; # gitweb run as script
+ if (eval { require HTTP::BrowserDetect; }) {
+ my $browser = HTTP::BrowserDetect->new();
+ return $browser->robot();
+ }
+ # fallback on detecting known web browsers
+ return 0 if ($ENV{'HTTP_USER_AGENT'} =~ /\b(?:Mozilla|Opera|Safari|IE)\b/);
+ # be conservative; if not sure, assume non-interactive
+ return 1;
+}
from
"[PATCHv6 17/24] gitweb: Show appropriate "Generating..." page when regenerating cache"
http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163040
http://repo.or.cz/w/git/jnareb-git.git/commitdiff/48679f7985ccda16dc54fda97790841bab4a0ba2
--
Jakub Narebski
Poland
ShadeHawk on #git
next prev parent reply other threads:[~2010-12-10 0:12 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 01/18] gitweb: Prepare for splitting gitweb John 'Warthog9' Hawley
2010-12-09 23:30 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 02/18] gitweb: add output buffering and associated functions John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 03/18] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 04/18] gitweb: Minimal testing of gitweb caching John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 05/18] gitweb: Regression fix concerning binary output of files John 'Warthog9' Hawley
2010-12-09 23:33 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 06/18] gitweb: Add more explicit means of disabling 'Generating...' page John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 07/18] gitweb: Revert back to $cache_enable vs. $caching_enabled John 'Warthog9' Hawley
2010-12-09 23:38 ` Jakub Narebski
2010-12-10 2:38 ` J.H.
2010-12-10 13:48 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 08/18] gitweb: Change is_cacheable() to return true always John 'Warthog9' Hawley
2010-12-09 23:46 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 09/18] gitweb: Revert reset_output() back to original code John 'Warthog9' Hawley
2010-12-09 23:58 ` Jakub Narebski
2010-12-10 2:43 ` J.H.
2010-12-09 21:57 ` [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type John 'Warthog9' Hawley
2010-12-10 0:06 ` Jakub Narebski
2010-12-10 3:39 ` J.H.
2010-12-10 12:10 ` Jakub Narebski
2010-12-10 12:25 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 11/18] gitweb: add isDumbClient() check John 'Warthog9' Hawley
2010-12-10 0:12 ` Jakub Narebski [this message]
2010-12-10 4:00 ` J.H.
2010-12-11 0:07 ` Junio C Hamano
2010-12-11 0:15 ` Jakub Narebski
2010-12-11 1:15 ` J.H.
2010-12-11 1:40 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 12/18] gitweb: Change file handles (in caching) to lexical variables as opposed to globs John 'Warthog9' Hawley
2010-12-10 0:16 ` Jakub Narebski
2010-12-10 0:32 ` Junio C Hamano
2010-12-10 0:47 ` Jakub Narebski
2010-12-10 5:56 ` J.H.
2010-12-09 21:57 ` [PATCH 13/18] gitweb: Add commented url & url hash to page footer John 'Warthog9' Hawley
2010-12-10 0:26 ` Jakub Narebski
2010-12-10 6:10 ` J.H.
2010-12-09 21:57 ` [PATCH 14/18] gitweb: add print_transient_header() function for central header printing John 'Warthog9' Hawley
2010-12-10 0:36 ` Jakub Narebski
2010-12-10 6:18 ` J.H.
2010-12-09 21:57 ` [PATCH 15/18] gitweb: Add show_warning() to display an immediate warning, with refresh John 'Warthog9' Hawley
2010-12-10 1:01 ` Jakub Narebski
2010-12-10 7:38 ` J.H.
2010-12-10 14:10 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 16/18] gitweb: When changing output (STDOUT) change STDERR as well John 'Warthog9' Hawley
2010-12-10 1:36 ` Jakub Narebski
2010-12-12 5:25 ` J.H.
2010-12-12 15:17 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 17/18] gitweb: Prepare for cached error pages & better error page handling John 'Warthog9' Hawley
2010-12-10 1:49 ` Jakub Narebski
2010-12-10 8:33 ` J.H.
2010-12-10 20:33 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 18/18] gitweb: Add better error handling for gitweb caching John 'Warthog9' Hawley
2010-12-10 1:56 ` Jakub Narebski
2010-12-09 23:26 ` [PATCH 00/18] Gitweb caching v8 Jakub Narebski
2010-12-10 0:43 ` J.H.
2010-12-10 1:27 ` Jakub Narebski
2010-12-10 0:39 ` Junio C Hamano
2010-12-10 0:45 ` J.H.
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m3hbem1o7a.fsf@localhost.localdomain \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=warthog9@eaglescrag.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).