git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC] gitweb: highlight: strip non-printable characters via col(1)
@ 2011-08-16 18:16 Christopher M. Fuhrman
  2011-08-16 20:30 ` J.H.
  0 siblings, 1 reply; 3+ messages in thread
From: Christopher M. Fuhrman @ 2011-08-16 18:16 UTC (permalink / raw)
  To: git; +Cc: jnareb, cwilson, sylvain, Christopher M. Fuhrman

From: "Christopher M. Fuhrman" <cfuhrman@panix.com>

The current code, as is, passes control characters, such as form-feed
(^L) to highlight which then passes it through to the browser.  This
will cause the browser to display one of the following warnings:

Safari v5.1 (6534.50) & Google Chrome v13.0.782.112:

  This page contains the following errors:

  error on line 657 at column 38: PCDATA invalid Char value 12
  Below is a rendering of the page up to the first error.

Mozilla Firefox 3.6.19 & Mozilla Firefox 5.0:

   XML Parsing Error: not well-formed
   Location:
   http://path/to/git/repo/blah/blah

Both errors were generated by gitweb.perl v1.7.3.4 w/ highlight 2.7
using arch/ia64/kernel/unwind.c from the Linux kernel.

Strip non-printable control-characters by piping the output produced
by git-cat-file(1) to col(1) as follows:

  git cat-file blob deadbeef314159 | col -bx | highlight <args>

Tested under OpenSuSE 11.4 & NetBSD 5.1 using perl 5.12.3 and perl
5.12.2 respectively using Safari, Firefox, and Google Chrome.

Signed-off-by: Christopher M. Fuhrman <cfuhrman@panix.com>
---

For an example of this bug in action, see:

 *
   http://git.fuhrbear.com/~cfuhrman/?p=linux/.git;a=blob;f=arch/alpha/kernel/core_titan.c;h=219bf271c0ba2e5f2d668af707df57fbbd00ccfd;hb=HEAD
 *
   http://git.fuhrbear.com/~cfuhrman/?p=linux/.git;a=blob;f=arch/ia64/kernel/unwind.c;h=fed6afa2e8a9014e65229e51e64fa4b1c13cc284;hb=HEAD

WRT the col(1) command, I've verified that the binary is installed in
/usr/bin on OpenSuSE, NetBSD, OpenBSD, Solaris 10, and AIX.  This
patch assumes that /usr/bin is in $PATH.

Cheers!

 gitweb/gitweb.perl |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 81dacf2..38d5d4e 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3656,6 +3656,7 @@ sub run_highlighter {
 
 	close $fd;
 	open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
+	          "col -bx | ".
 	          quote_command($highlight_bin).
 	          " --replace-tabs=8 --fragment --syntax $syntax |"
 		or die_error(500, "Couldn't open file or run syntax highlighter");
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH/RFC] gitweb: highlight: strip non-printable characters via col(1)
  2011-08-16 18:16 [PATCH/RFC] gitweb: highlight: strip non-printable characters via col(1) Christopher M. Fuhrman
@ 2011-08-16 20:30 ` J.H.
  2011-08-16 21:32   ` Christopher M. Fuhrman
  0 siblings, 1 reply; 3+ messages in thread
From: J.H. @ 2011-08-16 20:30 UTC (permalink / raw)
  To: Christopher M. Fuhrman; +Cc: git, jnareb, cwilson, sylvain

On 08/16/2011 11:16 AM, Christopher M. Fuhrman wrote:
> From: "Christopher M. Fuhrman" <cfuhrman@panix.com>
> 
> The current code, as is, passes control characters, such as form-feed
> (^L) to highlight which then passes it through to the browser.  This
> will cause the browser to display one of the following warnings:
> 
> Safari v5.1 (6534.50) & Google Chrome v13.0.782.112:
> 
>   This page contains the following errors:
> 
>   error on line 657 at column 38: PCDATA invalid Char value 12
>   Below is a rendering of the page up to the first error.
> 
> Mozilla Firefox 3.6.19 & Mozilla Firefox 5.0:
> 
>    XML Parsing Error: not well-formed
>    Location:
>    http://path/to/git/repo/blah/blah
> 
> Both errors were generated by gitweb.perl v1.7.3.4 w/ highlight 2.7
> using arch/ia64/kernel/unwind.c from the Linux kernel.
> 
> Strip non-printable control-characters by piping the output produced
> by git-cat-file(1) to col(1) as follows:
> 
>   git cat-file blob deadbeef314159 | col -bx | highlight <args>

So my only real concern here is that `col` itself is going to munge
whitespace.  Quoting from the col man page:

	[...] and replaces white-space characters with tabs where
	    possible. [...]

Have you actually run into a situation where something like ^L was
present in a blob that was being passed to highlight?

- John

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH/RFC] gitweb: highlight: strip non-printable characters via col(1)
  2011-08-16 20:30 ` J.H.
@ 2011-08-16 21:32   ` Christopher M. Fuhrman
  0 siblings, 0 replies; 3+ messages in thread
From: Christopher M. Fuhrman @ 2011-08-16 21:32 UTC (permalink / raw)
  To: J.H.; +Cc: git, jnareb, cwilson, sylvain

On Tue, 16 Aug 2011 at 1:30pm, J.H. wrote:

> On 08/16/2011 11:16 AM, Christopher M. Fuhrman wrote:
> > From: "Christopher M. Fuhrman" <cfuhrman@panix.com>
> >
> > The current code, as is, passes control characters, such as form-feed
> > (^L) to highlight which then passes it through to the browser.  This
> > will cause the browser to display one of the following warnings:
> >

<snip>

> > Strip non-printable control-characters by piping the output produced
> > by git-cat-file(1) to col(1) as follows:
> >
> >   git cat-file blob deadbeef314159 | col -bx | highlight <args>
>
> So my only real concern here is that `col` itself is going to munge
> whitespace.  Quoting from the col man page:
>
> 	[...] and replaces white-space characters with tabs where
> 	    possible. [...]

I figured that would be a concern which is why I added the -x option.
From the col(1) man page:

  -x        Output multiple spaces instead of tabs.

I also took a diff between two XHTML files.  One that used col -bx and one
that didn't.  Here's the results:

--- withoutcol.xhtml	2011-08-16 14:11:39.000000000 -0700
+++ withcol.xhtml	2011-08-16 14:11:26.000000000 -0700
@@ -52,7 +52,7 @@
 <span class="hl dir"># define DBG_CFG(args)</span>
 <span class="hl dir">#endif</span>

-
+
 <span class="hl com">/*</span>
 <span class="hl com"> * Routines to access TIG registers.</span>
 <span class="hl com"> */</span>
@@ -76,7 +76,7 @@
         <span class="hl sym">*</span>tig_addr <span class="hl sym">= (</span><span class="hl kwb">unsigned long</span><span class="hl sym">)</span>value<span class="hl sym">;</span>
 <span class="hl sym">}</span>

-
+
 <span class="hl com">/*</span>
 <span class="hl com"> * Given a bus, device, and function number, compute resulting</span>
 <span class="hl com"> * configuration space address</span>
@@ -197,7 +197,7 @@
         <span class="hl sym">.</span>write <span class="hl sym">=</span>        titan_write_config<span class="hl sym">,</span>
 <span class="hl sym">};</span>

(remainder stripped)

>
> Have you actually run into a situation where something like ^L was
> present in a blob that was being passed to highlight?
>

I've seen ^L is the Linux kernel source tree as well as the NetBSD src
tree.  I've not encountered it elsewhere although I would think it would
be present depending on personal/corporate coding preferences.

> - John

Cheers!

-- 
Chris Fuhrman
cfuhrman@panix.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-08-16 21:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-16 18:16 [PATCH/RFC] gitweb: highlight: strip non-printable characters via col(1) Christopher M. Fuhrman
2011-08-16 20:30 ` J.H.
2011-08-16 21:32   ` Christopher M. Fuhrman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).