From: Jakub Narebski <jnareb@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
"Christopher M. Fuhrman" <cfuhrman@panix.com>
Cc: git@vger.kernel.org, Christopher Wilson <cwilson@cdwilson.us>,
Sylvain Rabot <sylvain@abstraction.fr>
Subject: [PATCH] gitweb: Strip non-printable characters from syntax highlighter output
Date: Fri, 16 Sep 2011 14:41:57 +0200 [thread overview]
Message-ID: <201109161441.58946.jnareb@gmail.com> (raw)
In-Reply-To: <201108270006.19289.jnareb@gmail.com>
The current code, as is, passes control characters, such as form-feed
(^L) to highlight which then passes it through to the browser. User
agents (web browsers) that support 'application/xhtml+xml' usually
require that web pages declared as XHTML and with this mimetype are
well-formed XML. Unescaped control characters cannot appear within a
contents of a valid XML document.
This will cause the browser to display one of the following warnings:
* Safari v5.1 (6534.50) & Google Chrome v13.0.782.112:
This page contains the following errors:
error on line 657 at column 38: PCDATA invalid Char value 12
Below is a rendering of the page up to the first error.
* Mozilla Firefox 3.6.19 & Mozilla Firefox 5.0:
XML Parsing Error: not well-formed
Location:
http://path/to/git/repo/blah/blah
Both errors were generated by gitweb.perl v1.7.3.4 w/ highlight 2.7
using arch/ia64/kernel/unwind.c from the Linux kernel.
When syntax highlighter is not used, control characters are replaced
by esc_html(), but with syntax highlighter they were passed through to
browser (to_utf8() doesn't remove control characters).
Introduce sanitize() subroutine which strips forbidden characters, but
does not perform HTML escaping, and use it in git_blob() to sanitize
syntax highlighter output for XHTML.
Note that excluding "\t" (U+0009), "\n" (U+000A) and "\r" (U+000D) is
not strictly necessary, atleast for currently the only callsite: "\t"
tabs are replaced by spaces by untabify(), "\n" is stripped from each
line before processing it, and replacing "\r" could be considered
improvement.
Originally-by: Christopher M. Fuhrman <cfuhrman@panix.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
The commit message is from Christopher, but I have replaced his solution
of stripping non-printable characters via col(1) program by having gitweb
strip characters not allowed in XML.
Christopher, could you check that it fixes your issue?
gitweb/gitweb.perl | 14 +++++++++++++-
1 files changed, 13 insertions(+), 1 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 70a576a..c28b847 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1517,6 +1517,17 @@ sub esc_path {
return $str;
}
+# Sanitize for use in XHTML + application/xml+xhtm (valid XML 1.0)
+sub sanitize {
+ my $str = shift;
+
+ return undef unless defined $str;
+
+ $str = to_utf8($str);
+ $str =~ s|([[:cntrl:]])|($1 =~ /[\t\n\r]/ ? $1 : quot_cec($1))|eg;
+ return $str;
+}
+
# Make control characters "printable", using character escape codes (CEC)
sub quot_cec {
my $cntrl = shift;
@@ -6484,7 +6495,8 @@ sub git_blob {
$nr++;
$line = untabify($line);
printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
- $nr, esc_attr(href(-replay => 1)), $nr, $nr, $syntax ? to_utf8($line) : esc_html($line, -nbsp=>1);
+ $nr, esc_attr(href(-replay => 1)), $nr, $nr,
+ $syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
}
}
close $fd
--
1.7.6
next prev parent reply other threads:[~2011-09-16 12:42 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-22 22:58 [PATCH] gitweb: highlight: strip non-printable characters via col(1) Christopher M. Fuhrman
2011-08-22 23:21 ` Junio C Hamano
2011-08-26 19:54 ` Jakub Narebski
2011-08-26 21:44 ` Junio C Hamano
2011-08-26 22:06 ` Jakub Narebski
2011-09-16 12:41 ` Jakub Narebski [this message]
2011-09-16 16:32 ` [PATCH] gitweb: Strip non-printable characters from syntax highlighter output Junio C Hamano
2011-09-16 18:58 ` Jakub Narebski
2011-09-16 20:24 ` Junio C Hamano
2011-09-16 18:11 ` Christopher M. Fuhrman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201109161441.58946.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=cfuhrman@panix.com \
--cc=cwilson@cdwilson.us \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=sylvain@abstraction.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.