* [PATCH/RFC] gitweb: highlight: strip non-printable characters via col(1)
@ 2011-08-16 18:16 Christopher M. Fuhrman
2011-08-16 20:30 ` J.H.
0 siblings, 1 reply; 3+ messages in thread
From: Christopher M. Fuhrman @ 2011-08-16 18:16 UTC (permalink / raw)
To: git; +Cc: jnareb, cwilson, sylvain, Christopher M. Fuhrman
From: "Christopher M. Fuhrman" <cfuhrman@panix.com>
The current code, as is, passes control characters, such as form-feed
(^L) to highlight which then passes it through to the browser. This
will cause the browser to display one of the following warnings:
Safari v5.1 (6534.50) & Google Chrome v13.0.782.112:
This page contains the following errors:
error on line 657 at column 38: PCDATA invalid Char value 12
Below is a rendering of the page up to the first error.
Mozilla Firefox 3.6.19 & Mozilla Firefox 5.0:
XML Parsing Error: not well-formed
Location:
http://path/to/git/repo/blah/blah
Both errors were generated by gitweb.perl v1.7.3.4 w/ highlight 2.7
using arch/ia64/kernel/unwind.c from the Linux kernel.
Strip non-printable control-characters by piping the output produced
by git-cat-file(1) to col(1) as follows:
git cat-file blob deadbeef314159 | col -bx | highlight <args>
Tested under OpenSuSE 11.4 & NetBSD 5.1 using perl 5.12.3 and perl
5.12.2 respectively using Safari, Firefox, and Google Chrome.
Signed-off-by: Christopher M. Fuhrman <cfuhrman@panix.com>
---
For an example of this bug in action, see:
*
http://git.fuhrbear.com/~cfuhrman/?p=linux/.git;a=blob;f=arch/alpha/kernel/core_titan.c;h=219bf271c0ba2e5f2d668af707df57fbbd00ccfd;hb=HEAD
*
http://git.fuhrbear.com/~cfuhrman/?p=linux/.git;a=blob;f=arch/ia64/kernel/unwind.c;h=fed6afa2e8a9014e65229e51e64fa4b1c13cc284;hb=HEAD
WRT the col(1) command, I've verified that the binary is installed in
/usr/bin on OpenSuSE, NetBSD, OpenBSD, Solaris 10, and AIX. This
patch assumes that /usr/bin is in $PATH.
Cheers!
gitweb/gitweb.perl | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 81dacf2..38d5d4e 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3656,6 +3656,7 @@ sub run_highlighter {
close $fd;
open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
+ "col -bx | ".
quote_command($highlight_bin).
" --replace-tabs=8 --fragment --syntax $syntax |"
or die_error(500, "Couldn't open file or run syntax highlighter");
--
1.7.5.4
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH/RFC] gitweb: highlight: strip non-printable characters via col(1)
2011-08-16 18:16 [PATCH/RFC] gitweb: highlight: strip non-printable characters via col(1) Christopher M. Fuhrman
@ 2011-08-16 20:30 ` J.H.
2011-08-16 21:32 ` Christopher M. Fuhrman
0 siblings, 1 reply; 3+ messages in thread
From: J.H. @ 2011-08-16 20:30 UTC (permalink / raw)
To: Christopher M. Fuhrman; +Cc: git, jnareb, cwilson, sylvain
On 08/16/2011 11:16 AM, Christopher M. Fuhrman wrote:
> From: "Christopher M. Fuhrman" <cfuhrman@panix.com>
>
> The current code, as is, passes control characters, such as form-feed
> (^L) to highlight which then passes it through to the browser. This
> will cause the browser to display one of the following warnings:
>
> Safari v5.1 (6534.50) & Google Chrome v13.0.782.112:
>
> This page contains the following errors:
>
> error on line 657 at column 38: PCDATA invalid Char value 12
> Below is a rendering of the page up to the first error.
>
> Mozilla Firefox 3.6.19 & Mozilla Firefox 5.0:
>
> XML Parsing Error: not well-formed
> Location:
> http://path/to/git/repo/blah/blah
>
> Both errors were generated by gitweb.perl v1.7.3.4 w/ highlight 2.7
> using arch/ia64/kernel/unwind.c from the Linux kernel.
>
> Strip non-printable control-characters by piping the output produced
> by git-cat-file(1) to col(1) as follows:
>
> git cat-file blob deadbeef314159 | col -bx | highlight <args>
So my only real concern here is that `col` itself is going to munge
whitespace. Quoting from the col man page:
[...] and replaces white-space characters with tabs where
possible. [...]
Have you actually run into a situation where something like ^L was
present in a blob that was being passed to highlight?
- John
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH/RFC] gitweb: highlight: strip non-printable characters via col(1)
2011-08-16 20:30 ` J.H.
@ 2011-08-16 21:32 ` Christopher M. Fuhrman
0 siblings, 0 replies; 3+ messages in thread
From: Christopher M. Fuhrman @ 2011-08-16 21:32 UTC (permalink / raw)
To: J.H.; +Cc: git, jnareb, cwilson, sylvain
On Tue, 16 Aug 2011 at 1:30pm, J.H. wrote:
> On 08/16/2011 11:16 AM, Christopher M. Fuhrman wrote:
> > From: "Christopher M. Fuhrman" <cfuhrman@panix.com>
> >
> > The current code, as is, passes control characters, such as form-feed
> > (^L) to highlight which then passes it through to the browser. This
> > will cause the browser to display one of the following warnings:
> >
<snip>
> > Strip non-printable control-characters by piping the output produced
> > by git-cat-file(1) to col(1) as follows:
> >
> > git cat-file blob deadbeef314159 | col -bx | highlight <args>
>
> So my only real concern here is that `col` itself is going to munge
> whitespace. Quoting from the col man page:
>
> [...] and replaces white-space characters with tabs where
> possible. [...]
I figured that would be a concern which is why I added the -x option.
From the col(1) man page:
-x Output multiple spaces instead of tabs.
I also took a diff between two XHTML files. One that used col -bx and one
that didn't. Here's the results:
--- withoutcol.xhtml 2011-08-16 14:11:39.000000000 -0700
+++ withcol.xhtml 2011-08-16 14:11:26.000000000 -0700
@@ -52,7 +52,7 @@
<span class="hl dir"># define DBG_CFG(args)</span>
<span class="hl dir">#endif</span>
-
+
<span class="hl com">/*</span>
<span class="hl com"> * Routines to access TIG registers.</span>
<span class="hl com"> */</span>
@@ -76,7 +76,7 @@
<span class="hl sym">*</span>tig_addr <span class="hl sym">= (</span><span class="hl kwb">unsigned long</span><span class="hl sym">)</span>value<span class="hl sym">;</span>
<span class="hl sym">}</span>
-
+
<span class="hl com">/*</span>
<span class="hl com"> * Given a bus, device, and function number, compute resulting</span>
<span class="hl com"> * configuration space address</span>
@@ -197,7 +197,7 @@
<span class="hl sym">.</span>write <span class="hl sym">=</span> titan_write_config<span class="hl sym">,</span>
<span class="hl sym">};</span>
(remainder stripped)
>
> Have you actually run into a situation where something like ^L was
> present in a blob that was being passed to highlight?
>
I've seen ^L is the Linux kernel source tree as well as the NetBSD src
tree. I've not encountered it elsewhere although I would think it would
be present depending on personal/corporate coding preferences.
> - John
Cheers!
--
Chris Fuhrman
cfuhrman@panix.com
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-08-16 21:32 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-16 18:16 [PATCH/RFC] gitweb: highlight: strip non-printable characters via col(1) Christopher M. Fuhrman
2011-08-16 20:30 ` J.H.
2011-08-16 21:32 ` Christopher M. Fuhrman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).