From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jakub Narebski Subject: Re: [PATCH] gitweb: highlight: strip non-printable characters via col(1) Date: Fri, 26 Aug 2011 21:54:13 +0200 Message-ID: <201108262154.14493.jnareb@gmail.com> References: <1314053923-13122-1-git-send-email-cfuhrman@panix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: 7bit Cc: gitster@pobox.com, git@vger.kernel.org, cwilson@cdwilson.us, sylvain@abstraction.fr To: "Christopher M. Fuhrman" X-From: git-owner@vger.kernel.org Fri Aug 26 21:54:36 2011 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Qx2U7-0001j3-R9 for gcvg-git-2@lo.gmane.org; Fri, 26 Aug 2011 21:54:36 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753705Ab1HZTya (ORCPT ); Fri, 26 Aug 2011 15:54:30 -0400 Received: from mail-fx0-f46.google.com ([209.85.161.46]:37771 "EHLO mail-fx0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753313Ab1HZTya (ORCPT ); Fri, 26 Aug 2011 15:54:30 -0400 Received: by fxh19 with SMTP id 19so2819381fxh.19 for ; Fri, 26 Aug 2011 12:54:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:to:subject:date:user-agent:cc:references:in-reply-to :mime-version:content-type:content-transfer-encoding :content-disposition:message-id; bh=m7evexFuD/fDsByMgrlEKbA4sJjTC6YsmhMw1TKIbKY=; b=i3atRNfSQKnKg69SIH8gPSqfl7E9MVr81mJek2Jekknb/536u/u32pm3ey09mcThka ZyTmpEkSqfHWCMAWZvOQJEAuYqWsoZsjrCR19w+HO0AgIpoXMnBmOoxtu5F5l5oK5JZ4 jCRBHIJ5G1SIZcc8AcUNZ6aBkZlyGZI6QjsM8= Received: by 10.223.11.6 with SMTP id r6mr2275200far.57.1314388468826; Fri, 26 Aug 2011 12:54:28 -0700 (PDT) Received: from [192.168.1.13] (abwo191.neoplus.adsl.tpnet.pl [83.8.238.191]) by mx.google.com with ESMTPS id l22sm1587900fam.13.2011.08.26.12.54.20 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 26 Aug 2011 12:54:21 -0700 (PDT) User-Agent: KMail/1.9.3 In-Reply-To: <1314053923-13122-1-git-send-email-cfuhrman@panix.com> Content-Disposition: inline Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Tue, 23 Aug 2011, Christopher M. Fuhrman wrote: > The current code, as is, passes control characters, such as form-feed > (^L) to highlight which then passes it through to the browser. This > will cause the browser to display one of the following warnings: > > Safari v5.1 (6534.50) & Google Chrome v13.0.782.112: > > This page contains the following errors: > > error on line 657 at column 38: PCDATA invalid Char value 12 > Below is a rendering of the page up to the first error. > > Mozilla Firefox 3.6.19 & Mozilla Firefox 5.0: > > XML Parsing Error: not well-formed > Location: > http://path/to/git/repo/blah/blah > > Both errors were generated by gitweb.perl v1.7.3.4 w/ highlight 2.7 > using arch/ia64/kernel/unwind.c from the Linux kernel. > > Strip non-printable control-characters by piping the output produced > by git-cat-file(1) to col(1) as follows: > > git cat-file blob deadbeef314159 | col -bx | highlight > > Note usage of the '-x' option which tells col(1) to output multiple > spaces instead of tabs. Why use external program (which ming be not installed, or might not strip control-characters), instead of making gitweb sanitize highlighter output itself. Something like the patch below (which additionally shows where there are control characters): -- >8 -- diff --git i/gitweb/gitweb.perl w/gitweb/gitweb.perl index 7cf12af..192db2c 100755 --- i/gitweb/gitweb.perl +++ w/gitweb/gitweb.perl @@ -1517,6 +1517,17 @@ sub esc_path { return $str; } +# Sanitize for use in XHTML + application/xml+xhtml +sub sanitize { + my $str = shift; + + return undef unless defined $str; + + $str = to_utf8($str); + $str =~ s|([[:cntrl:]])|quot_cec($1)|eg; + return $str; +} + # Make control characters "printable", using character escape codes (CEC) sub quot_cec { my $cntrl = shift; @@ -6546,7 +6557,8 @@ sub git_blob { $nr++; $line = untabify($line); printf qq!
%4i %s
\n!, - $nr, esc_attr(href(-replay => 1)), $nr, $nr, $syntax ? to_utf8($line) : esc_html($line, -nbsp=>1); + $nr, esc_attr(href(-replay => 1)), $nr, $nr, + $syntax ? sanitize($line) : esc_html($line, -nbsp=>1); } } close $fd -- 8< -- -- Jakub Narebski Poland