From mboxrd@z Thu Jan 1 00:00:00 1970 From: "J.H." Subject: Re: [PATCH/RFC] gitweb: highlight: strip non-printable characters via col(1) Date: Tue, 16 Aug 2011 13:30:22 -0700 Message-ID: <4E4AD35E.8060907@eaglescrag.net> References: <1313518605-26460-1-git-send-email-cfuhrman@panix.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: git@vger.kernel.org, jnareb@gmail.com, cwilson@cdwilson.us, sylvain@abstraction.fr To: "Christopher M. Fuhrman" X-From: git-owner@vger.kernel.org Tue Aug 16 22:30:40 2011 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QtQHT-0001iu-Vi for gcvg-git-2@lo.gmane.org; Tue, 16 Aug 2011 22:30:36 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752422Ab1HPUaa (ORCPT ); Tue, 16 Aug 2011 16:30:30 -0400 Received: from shards.monkeyblade.net ([198.137.202.13]:48470 "EHLO shards.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751887Ab1HPUaa (ORCPT ); Tue, 16 Aug 2011 16:30:30 -0400 Received: from voot-cruiser.eaglescrag.net ([216.123.155.199]) (authenticated bits=0) by shards.monkeyblade.net (8.14.4/8.14.4) with ESMTP id p7GKUM8k005652 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 16 Aug 2011 13:30:23 -0700 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc14 Lightning/1.0b3pre Thunderbird/3.1.10 In-Reply-To: <1313518605-26460-1-git-send-email-cfuhrman@panix.com> X-Enigmail-Version: 1.1.2 X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (shards.monkeyblade.net [198.137.202.13]); Tue, 16 Aug 2011 13:30:23 -0700 (PDT) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On 08/16/2011 11:16 AM, Christopher M. Fuhrman wrote: > From: "Christopher M. Fuhrman" > > The current code, as is, passes control characters, such as form-feed > (^L) to highlight which then passes it through to the browser. This > will cause the browser to display one of the following warnings: > > Safari v5.1 (6534.50) & Google Chrome v13.0.782.112: > > This page contains the following errors: > > error on line 657 at column 38: PCDATA invalid Char value 12 > Below is a rendering of the page up to the first error. > > Mozilla Firefox 3.6.19 & Mozilla Firefox 5.0: > > XML Parsing Error: not well-formed > Location: > http://path/to/git/repo/blah/blah > > Both errors were generated by gitweb.perl v1.7.3.4 w/ highlight 2.7 > using arch/ia64/kernel/unwind.c from the Linux kernel. > > Strip non-printable control-characters by piping the output produced > by git-cat-file(1) to col(1) as follows: > > git cat-file blob deadbeef314159 | col -bx | highlight So my only real concern here is that `col` itself is going to munge whitespace. Quoting from the col man page: [...] and replaces white-space characters with tabs where possible. [...] Have you actually run into a situation where something like ^L was present in a blob that was being passed to highlight? - John