All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Pavan Kumar Sunkara <pavan.sss1991@gmail.com>
Cc: git@vger.kernel.org, Christian Couder <chriscool@tuxfamily.org>,
	Petr Baudis <pasky@suse.cz>,
	Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Subject: Re: [RFC/PATCH] gitweb: Create Gitweb::Escape module
Date: Sat, 5 Jun 2010 11:22:10 +0200	[thread overview]
Message-ID: <201006051122.12565.jnareb@gmail.com> (raw)
In-Reply-To: <1275642437-4846-1-git-send-email-pavan.sss1991@gmail.com>

On Fri, 4 June 2010, Pavan Kumar Sunkara wrote:

> Create a Gitweb::Escape module in 'gitweb/lib/Gitweb/Escape.pm'
> to store all the quoting/unquoting and escaping subroutines
> regarding the gitweb.perl script.

<del>
I would ask if it wouldn't be simpler to just have Gitweb::Util
for all auxiliary / helper subroutines...

...if not for the fact that to avoid circular dependencies you would
have, most probably, split this module further into
 * Gitweb::Escape::CGI
 * Gitweb::Escape::HTML
and perhaps move 'unquote' to Gitweb::Git or Gitweb::Escape::Git.
Then Gitweb::Escape could just import both.
</del>

Errr... sorry.  I guess that either Gitweb::Config nor Gitweb::Request
needs to include Gitweb::Escape.

Note that 'Escape' part of name is not entirely correct, because some of
subroutines mentioned below deal with _quoting_ / _unquoting_, not with
escaping / unescaping.

> 
> Subroutines moved:
> 	to_utf8
> 	esc_param
> 	esc_url
> 	esc_html
> 	esc_path
> 	quot_cec
> 	quot_upr
> 	unquote
> 	untabify
> 
> Subroutines yet to move: (Contains not yet packaged subs & vars)
> 	None
> 
> Update gitweb/Makefile to install gitweb modules alongside gitwe
> 
> Signed-off-by: Pavan Kumar Sunkara <pavan.sss1991@gmail.com>
> ---
>  gitweb/Makefile             |    1 +
>  gitweb/gitweb.perl          |  157 +--------------------------------------
>  gitweb/lib/Gitweb/Escape.pm |  175 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 177 insertions(+), 156 deletions(-)
>  create mode 100644 gitweb/lib/Gitweb/Escape.pm

It's a pity that git cannot tell us here that it is straight code
movement (moving code block from one file to the other)...

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index b30ac48..9e54983 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -12,7 +12,6 @@ use warnings;
>  use CGI qw(:standard :escapeHTML -nosticky);
>  use CGI::Util qw(unescape);
>  use CGI::Carp qw(fatalsToBrowser set_message);
> -use Encode;
>  use Fcntl ':mode';
>  use File::Find qw();
>  use File::Basename qw(basename);
> @@ -27,6 +26,7 @@ use lib __DIR__ . "/lib";
>  
>  use Gitweb::Config;
>  use Gitweb::Request;
> +use Gitweb::Escape;
>  
>  BEGIN {
>  	CGI->compile() if $ENV{'MOD_PERL'};
> @@ -762,161 +762,6 @@ sub validate_project {
>  	}
>  }
>  
> -# decode sequences of octets in utf8 into Perl's internal form,
> -# which is utf-8 with utf8 flag set if needed.  gitweb writes out
> -# in utf-8 thanks to "binmode STDOUT, ':utf8'" at beginning
> -sub to_utf8 {

[cut]

> diff --git a/gitweb/lib/Gitweb/Escape.pm b/gitweb/lib/Gitweb/Escape.pm
> new file mode 100644
> index 0000000..f6ea209
> --- /dev/null
> +++ b/gitweb/lib/Gitweb/Escape.pm
> @@ -0,0 +1,175 @@
> +#!/usr/bin/perl
> +#
> +# Gitweb::Escape -- gitweb's quoting/unquoting, escaping package
> +#
> +# This program is licensed under the GPLv2
> +
> +package Gitweb::Escape;
> +
> +use strict;
> +use warnings;
> +use Exporter qw(import);
> +
> +our @ISA = qw(Exporter);

This line is not necessary if you are using 'use Exporter qw(import);'


This code:

  use Exporter;
  our @ISA = qw(Exporter);

is equivalent to (safer)

  use Exporter;
  use base qw(Exporter);

The section "Exporting without inheriting from Exporter" of
Exporter(3pm) manpage says that using

  use Exporter qw(import);

allows to avoid inheriting our package from Exporter.

> +our @EXPORT = qw(&to_utf8 &esc_param &esc_url &esc_html &esc_path &quot_cec &quot_upr &unquote &untabify);

Please try to not introduce such long likes.  Perhaps:

  +our @EXPORT = qw(&to_utf8 &esc_param &esc_url &esc_html &esc_path
                    &quot_cec &quot_upr &unquote &untabify);

Also ampersands are not needed there, and I think hat they are not used
usually, so it would be:

  +our @EXPORT = qw(to_utf8 esc_param esc_url esc_html esc_path
                    quot_cec quot_upr unquote untabify);

But note also comment below about splitting this file further.

> +
> +use Encode;

Perhaps empty line between 'use Exporter;' (using outside Perl modules)
and 'use Gitweb::Config;' would help readibility of code, hmmm...?

Also, don't you need 'use CGI;' or 'use CGI qw(:escapeHTML);' here, to
have access to CGI::escape (to be more exact CGI::Util::escape, but
imported by CGI to CGI::escape) and CGI::escapeHTML?

> +use Gitweb::Config;

I personally would say here explicitely

  +use Gitweb::Config qw($fallback_encoding);

to make it clear that we need only this single one config variable.

> +use Gitweb::Request;

As it is shown below, this is not necessary.

> +
> +# decode sequences of octets in utf8 into Perl's internal form,
> +# which is utf-8 with utf8 flag set if needed.  gitweb writes out
> +# in utf-8 thanks to "binmode STDOUT, ':utf8'" at beginning

s/at beginning/at beginning of XXX/;

> +sub to_utf8 {
> +	my $str = shift;
> +	return undef unless defined $str;
> +	if (utf8::valid($str)) {
> +		utf8::decode($str);
> +		return $str;
> +	} else {
> +		return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> +	}
> +}

This single one uses $fallback_encoding from Gitweb::Config.

> +
> +# quote unsafe chars, but keep the slash, even when it's not
> +# correct, but quoted slashes look too horrible in bookmarks
> +sub esc_param {
> +	my $str = shift;
> +	return undef unless defined $str;
> +	$str =~ s/([^A-Za-z0-9\-_.~()\/:@ ]+)/CGI::escape($1)/eg;
> +	$str =~ s/ /\+/g;
> +	return $str;
> +}
> +
> +# quote unsafe chars in whole URL, so some charactrs cannot be quoted
> +sub esc_url {
> +	my $str = shift;
> +	return undef unless defined $str;
> +	$str =~ s/([^A-Za-z0-9\-_.~();\/;?:@&=])/sprintf("%%%02X", ord($1))/eg;
> +	$str =~ s/\+/%2B/g;
> +	$str =~ s/ /\+/g;
> +	return $str;
> +}

Hmmm... why this difference?  esc_url() should differ from esc_param()
only in that esc_url() escapes less (it should not escape query
parameter delimites).  It seems like 452e225 (gitweb: fix esc_param,
2009-10-13) by Giuseppe Bilotta missed fixing esc_url(), isn't it?

Might be worth fixing upfront (in a separate commit, of course).  Could
you do this, please?

> +
> +# replace invalid utf8 character with SUBSTITUTION sequence
> +sub esc_html {
> +	my $str = shift;
> +	my %opts = @_;
> +
> +	return undef unless defined $str;
> +
> +	$str = to_utf8($str);
> +	$str = $cgi->escapeHTML($str);

We can use 'CGI::escapeHTML;' here, and avoid dependency on
Gitweb::Request.

This change probably should be done in a separate cleanup patch,
preceding this one.

> +	if ($opts{'-nbsp'}) {
> +		$str =~ s/ /&nbsp;/g;
> +	}
> +	$str =~ s|([[:cntrl:]])|(($1 ne "\t") ? quot_cec($1) : $1)|eg;
> +	return $str;
> +}
> +
> +# quote control characters and escape filename to HTML
> +sub esc_path {
> +	my $str = shift;
> +	my %opts = @_;
> +
> +	return undef unless defined $str;
> +
> +	$str = to_utf8($str);
> +	$str = $cgi->escapeHTML($str);

Same here.

> +	if ($opts{'-nbsp'}) {
> +		$str =~ s/ /&nbsp;/g;
> +	}
> +	$str =~ s|([[:cntrl:]])|quot_cec($1)|eg;
> +	return $str;
> +}

Those two subroutines use to_utf8(), which in turn uses
$fallback_encoding.

> +
> +# Make control characters "printable", using character escape codes (CEC)
> +sub quot_cec {
[...]
> +}
> +
> +# Alternatively use unicode control pictures codepoints,
> +# Unicode "printable representation" (PR)
> +sub quot_upr {
[...]
> +}

> +# git may return quoted and escaped filenames
> +sub unquote {
[...]
> +}

Note that this subroutine is about neither CGI nor HTML, but about
unquoting and unescaping output of git commands.  Just FYI.

> +
> +# escape tabs (convert tabs to spaces)
> +sub untabify {
> +	my $line = shift;
> +
> +	while ((my $pos = index($line, "\t")) != -1) {
> +		if (my $count = (8 - ($pos % 8))) {
> +			my $spaces = ' ' x $count;
> +			$line =~ s/\t/$spaces/;
> +		}
> +	}
> +
> +	return $line;
> +}
> +
> +1;

Good work!
-- 
Jakub Narebski
Poland

      reply	other threads:[~2010-06-05  9:20 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-04  9:07 [RFC/PATCH] gitweb: Create Gitweb::Escape module Pavan Kumar Sunkara
2010-06-05  9:22 ` Jakub Narebski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201006051122.12565.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=giuseppe.bilotta@gmail.com \
    --cc=pasky@suse.cz \
    --cc=pavan.sss1991@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.