git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] gitweb: Sanitize title attribute in format_subject_html
@ 2009-05-22 15:35 Jakub Narebski
  2009-05-22 15:39 ` [PATCH (lite)] " Jakub Narebski
  2009-05-22 15:55 ` [RFC/PATCH (do not use)] gitweb: Introduce and use esc_attr to escape values of tag attributes Jakub Narebski
  0 siblings, 2 replies; 3+ messages in thread
From: Jakub Narebski @ 2009-05-22 15:35 UTC (permalink / raw)
  To: git; +Cc: Paul Gortmaker

Replace control characters with question mark '?' (like in
chop_and_esc_str).


A little background: some web browsers turn on strict (and
unforgiving) XML validating mode for XHTML documents served using
application/xhtml+xml content type.  This means among others that
control characters are forbidden to appear in gitweb output.

CGI.pm does by default slight escaping (using simple_escape subroutine
from CGI::Util) of all _attribute_ values (depending on the value of
autoEscape, by default on).  This escaping, at least in CGI.pm version
3.10 (most current version at CPAN is 3.43), is minimal: only '"',
'&', '<' and '>' are escaped using named HTML entity references
(&quot;, &amp;, &lt; and &gt; respectively).  But simple_escape does
not do escaping of control characters such as ^X which are invalid in
XHTML (in strict mode).

If by some accident commit message do contain some control character
in first 50 characters (more or less) of first line of commit message,
and this line is longer than 50 characters (so gitweb shortens it for
display), then gitweb would put this control character in title
attribute (and CGI.pm would not remove them).  The tag _contents_ is
safe because it is escaped using esc_html() explicitly, and it
replaces control characters by their printable representation.


While at it: chop_and_escape_str doesn't need capturing group.

Noticed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This issue first appeared (with a wrong solution) month ago in
http://thread.gmane.org/gmane.comp.version-control.git/116755

I'm sorry Paul that it took so long to fix it.


This patch will be followed by 'lite' version, with minimal commit
message (this one is a bit long), and with failed attempt using
esc_attr.

 gitweb/gitweb.perl |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 05702e4..1e7e2d8 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1236,7 +1236,7 @@ sub chop_and_escape_str {
 	if ($chopped eq $str) {
 		return esc_html($chopped);
 	} else {
-		$str =~ s/([[:cntrl:]])/?/g;
+		$str =~ s/[[:cntrl:]]/?/g;
 		return $cgi->span({-title=>$str}, esc_html($chopped));
 	}
 }
@@ -1459,6 +1459,7 @@ sub format_subject_html {
 	$extra = '' unless defined($extra);
 
 	if (length($short) < length($long)) {
+		$long =~ s/[[:cntrl:]]/?/g;
 		return $cgi->a({-href => $href, -class => "list subject",
 		                -title => to_utf8($long)},
 		       esc_html($short) . $extra);
-- 
1.6.3.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH (lite)] gitweb: Sanitize title attribute in format_subject_html
  2009-05-22 15:35 [PATCH] gitweb: Sanitize title attribute in format_subject_html Jakub Narebski
@ 2009-05-22 15:39 ` Jakub Narebski
  2009-05-22 15:55 ` [RFC/PATCH (do not use)] gitweb: Introduce and use esc_attr to escape values of tag attributes Jakub Narebski
  1 sibling, 0 replies; 3+ messages in thread
From: Jakub Narebski @ 2009-05-22 15:39 UTC (permalink / raw)
  To: git; +Cc: Paul Gortmaker

Replace control characters with question mark '?' (like in
chop_and_esc_str).

While at it: chop_and_escape_str doesn't need capturing group.

Noticed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This version has more sane ratio of commit message length to actual
change...

 gitweb/gitweb.perl |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 05702e4..1e7e2d8 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1236,7 +1236,7 @@ sub chop_and_escape_str {
 	if ($chopped eq $str) {
 		return esc_html($chopped);
 	} else {
-		$str =~ s/([[:cntrl:]])/?/g;
+		$str =~ s/[[:cntrl:]]/?/g;
 		return $cgi->span({-title=>$str}, esc_html($chopped));
 	}
 }
@@ -1459,6 +1459,7 @@ sub format_subject_html {
 	$extra = '' unless defined($extra);
 
 	if (length($short) < length($long)) {
+		$long =~ s/[[:cntrl:]]/?/g;
 		return $cgi->a({-href => $href, -class => "list subject",
 		                -title => to_utf8($long)},
 		       esc_html($short) . $extra);
-- 
1.6.3.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [RFC/PATCH (do not use)] gitweb: Introduce and use esc_attr to escape values of tag attributes
  2009-05-22 15:35 [PATCH] gitweb: Sanitize title attribute in format_subject_html Jakub Narebski
  2009-05-22 15:39 ` [PATCH (lite)] " Jakub Narebski
@ 2009-05-22 15:55 ` Jakub Narebski
  1 sibling, 0 replies; 3+ messages in thread
From: Jakub Narebski @ 2009-05-22 15:55 UTC (permalink / raw)
  To: git

CGI.pm does by default slight escaping (simple_escape from CGI::Util)
of all _attribute_ values (depending on the value of autoEscape(), by
default on).

This escaping, at least in CGI.pm version 3.10 (most current version
at CPAN is 3.43), is minimal: only '"', '&', '<' and '>' are escaped
using named HTML entity references (&quot;, &amp;, &lt; and &gt;
respectively).  simple_escape does not do escaping of control
characters such as ^X which are invalid in XHTML (in strict mode).
Note that IIRC escaping '<' and '>' in attributes is not strictly
necessary.


New 'esc_attr' subroutine can be used to do more complete escaping of
attribute values, including dealing with control characters (forbidden
in XHTML in strict validating mode); currently it replaces all control
characters by '?' question mark.  For esc_attr() to work correctly you
have to turn off CGI.pm autoescaping to avoid double escaping.

While at it chop_and_esc_str now uses esc_attr instead of simply
replacing control characters with '?' by itself.


The problem is that when autoescaping is turned off we would have to
esc_attr all attributes containing user input (this includes result of
href() subroutine), or risk either double escaping, or not escaping
some fragment.


Alternate solution include:
* stripping of control characters before input is passed as attribute
  value, as it was done in chop_and_esc_str; this would include at
  minimum format_subject_html
* replacing either make_attributes from CGI.pm, or simple_escape
  subrotuine it uses, making it escape characters better


Noticed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>

http://thread.gmane.org/gmane.comp.version-control.git/116755/focus=117539
---
This is abandoned version. Still... perhaps esc_attr would be of use
to somebody... Also as negative example in archives.

 gitweb/gitweb.perl |   39 +++++++++++++++++++++++++++++++++------
 1 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 05702e4..e505f2f 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1027,7 +1027,7 @@ sub esc_param {
 	return $str;
 }
 
-# quote unsafe chars in whole URL, so some charactrs cannot be quoted
+# quote unsafe chars in whole URL, so some characters cannot be quoted
 sub esc_url {
 	my $str = shift;
 	$str =~ s/([^A-Za-z0-9\-_.~();\/;?:@&=])/sprintf("%%%02X", ord($1))/eg;
@@ -1036,6 +1036,26 @@ sub esc_url {
 	return $str;
 }
 
+# quote and escape tag attribute values; autoEscape has to be turned off
+# otherwise you would get double escaping for '&' ("escape" character)
+sub esc_attr {
+	my $str = shift;
+	return $str unless defined $str;
+
+	my %ent = ( # named HTML entities
+		'"' => '&quot;',
+		'&' => '&amp;',
+		# those are not strictly necessary
+		'<' => '&lt;',
+		'>' => '&gt;',
+	);
+	$str = to_utf8($str);
+	$str =~ s|([\"&<>])|$ent{$1}|eg;
+	$str =~ s|([[:cntrl:]])|?|eg; # alternative would be use quot_xxx($1)
+
+	return $str;
+}
+
 # replace invalid utf8 character with SUBSTITUTION sequence
 sub esc_html {
 	my $str = shift;
@@ -1236,8 +1256,10 @@ sub chop_and_escape_str {
 	if ($chopped eq $str) {
 		return esc_html($chopped);
 	} else {
-		$str =~ s/([[:cntrl:]])/?/g;
-		return $cgi->span({-title=>$str}, esc_html($chopped));
+		my $autoescape = $cgi->autoEscape(undef);
+		my $result = $cgi->span({-title=>esc_attr($str)}, esc_html($chopped));
+		$cgi->autoEscape($autoescape); # restore original value
+		return $result;
 	}
 }
 
@@ -1458,14 +1480,19 @@ sub format_subject_html {
 	my ($long, $short, $href, $extra) = @_;
 	$extra = '' unless defined($extra);
 
+	my $ret = '';
 	if (length($short) < length($long)) {
-		return $cgi->a({-href => $href, -class => "list subject",
-		                -title => to_utf8($long)},
+		my $autoescape = $cgi->autoEscape(undef);
+		# or just replace s/([[:cntrl:]])/?/g in -title
+		$ret = $cgi->a({-href => $href, -class => "list subject",
+		                -title => esc_attr($long)},
 		       esc_html($short) . $extra);
+		$cgi->autoEscape($autoescape); # restore original value
 	} else {
-		return $cgi->a({-href => $href, -class => "list subject"},
+		$ret = $cgi->a({-href => $href, -class => "list subject"},
 		       esc_html($long)  . $extra);
 	}
+	return $ret;
 }
 
 # format git diff header line, i.e. "diff --(git|combined|cc) ..."
-- 
1.6.3.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-05-22 15:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-22 15:35 [PATCH] gitweb: Sanitize title attribute in format_subject_html Jakub Narebski
2009-05-22 15:39 ` [PATCH (lite)] " Jakub Narebski
2009-05-22 15:55 ` [RFC/PATCH (do not use)] gitweb: Introduce and use esc_attr to escape values of tag attributes Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).