git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "J.H." <warthog9@eaglescrag.net>
To: Olaf Alders <olaf@wundersolutions.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>,
	Jakub Narebski <jnareb@gmail.com>,
	git@vger.kernel.org,
	"John 'Warthog9' Hawley" <warthog9@kernel.org>,
	Junio C Hamano <gitster@pobox.com>, Petr Baudis <pasky@ucw.cz>,
	admin@repo.or.cz
Subject: Re: [RFC] Implementing gitweb output caching - issues to solve
Date: Thu, 09 Dec 2010 20:46:55 -0800	[thread overview]
Message-ID: <4D01B0BF.1010609@eaglescrag.net> (raw)
In-Reply-To: <88CF82F1-0363-47B4-8C6F-AE4A2DA1714B@wundersolutions.com>

>> Interesting.  http://www.user-agents.org/ seems to suggest that many
>> robots do use Mozilla (though I don't think it's worth bending over
>> backwards to help them see the page correctly).

If a robot reports itself and we don't know about it, I'm fine with
giving it the 'Generating...' page as opposed to what it's expecting.
The number of robots and things of that nature that won't handle the
meta refresh are fewer than the number of people who will be clicking
with eyeballs on a screen.

>> HTTP::BrowserDetect uses a blacklist as far as I can tell.  Maybe in
>> the long term it would be nice to add a whitelist ->human() method.
>>
>> Cc-ing Olaf Alders for ideas.
> 
> Thanks for including me in this.  :)  I'm certainly open to patching the module, but I'm not 100% clear on how  you would want to implement this.  How is ->is_human different from !->is_robot?  To clarify, I should say that from the snippet above, I'm not 100% clear on what the problem is which needs to be solved.

At this point I don't really see an issue with HTTP::BrowserDetect's
robot() function, and I agree with human = !->is_robot.

One thing I would like to see is the ability to do some sort of an add
to the list of things to check for.  As you are probably aware there are
more agents that exist than what you have setup, I'm moving forward and
handling it with the following:

sub is_dumb_client {
        my($user_agent) = lc $ENV{'HTTP_USER_AGENT'};

        my $browser_detect = HTTP::BrowserDetect->new($user_agent);

        return 1 if ( $browser_detect->robot() );

        foreach my $adc ( @additional_dumb_clients ) {
                return 1 if ( index( $user_agent, lc $adc ) != -1 );
        }

        return 0;
}

which could be simplified if there was just some way to do

        my($user_agent) = lc $ENV{'HTTP_USER_AGENT'};

        my $browser_detect = HTTP::BrowserDetect->new($user_agent);

        $browser_detect->add_robots( @array );

        return 1 if ( $browser_detect->robot() );

Not sure that particularly generalizes, and honestly it's only 4 lines
of code to do add additional checks.

- John 'Warthog9' Hawley

      parent reply	other threads:[~2010-12-10  4:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-04 16:21 [RFC] Implementing gitweb output caching - issues to solve Jakub Narebski
2010-12-09  1:31 ` J.H.
2010-12-09  5:22   ` Junio C Hamano
2010-12-09  5:28     ` J.H.
2010-12-09 22:30   ` Jakub Narebski
2010-12-09 22:52     ` Jonathan Nieder
2010-12-10  3:17       ` Olaf Alders
2010-12-10  4:11         ` Jonathan Nieder
2010-12-10  4:46         ` J.H. [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D01B0BF.1010609@eaglescrag.net \
    --to=warthog9@eaglescrag.net \
    --cc=admin@repo.or.cz \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=olaf@wundersolutions.com \
    --cc=pasky@ucw.cz \
    --cc=warthog9@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).