git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/n] gitweb: Use '&iquot;' instead of '?' in esc_path
Date: Fri, 3 Nov 2006 23:33:49 +0100	[thread overview]
Message-ID: <200611032333.49794.jnareb@gmail.com> (raw)
In-Reply-To: <7virhw5hoi.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
>> # quote unsafe characters and escape filename to HTML
>> sub esc_path {
>> 	my $str = shift;
>> 	$str = esc_html($str);
>> 	$str =~ s!([[:cntrl:]])!sprintf('<span 
class="cntrl">&#%04d;</span>', 9216+ord($1))!eg;
>> 	return $str;
>> }
>>
>> with perhaps the following CSS
>>
>> span.cntrl {
>> 	border: dashed #aaaaaa;
>> 	border-width: 1px;
>> 	padding: 0px 2px 0px 2px;
>> 	margin:  0px 2px 0px 2px;
>> }
>>
>> What do you think of it?
> 
> Probably "# quote unsafe characters" is not what it does yet (it
> just quotes controls currently and nothing else), but we have to
> start somewhere and I think this is a good start.

Well, control characters (at least some of them) are not correct
characters in UTF-8 HTML output; Mozilla in strict XHTML mode complains.
Currently for example esc_html escapes FORM FEED (FF) and ESCAPE (ESC)
characters, because they happened to be present in git.git repository
(in COPYING file and in commit v1.4.2.1-g20a3847 respectively).

As I see it, we can either replace non-safe characters (control
characters) by single characters a la --hide-control-chars: that
is minimal solution, or we can quote unseafe characters somewhat,
but if we do that we have to indicate that we quote. Git core and
ls encloses material which needs escaping with quotes; in gitweb
it is somewhat impractical; besides we have more possibilities
to mark fragment of text (span element encompassing representation
of escaped characters for example).

I have thought of the following escaping:

1. Hide control characters using '?' or other similar character like
   &cdot; for example
2. Use "Unicode" quoting, i.e. replace control characters by their
   Unicode Printable Representation (PR), as shown above. Has the
   advantage that it is simple and does not need theoretically marking
   that it is quoted; has the disadvantage that browser must support
   this part of Unicode, and that those characters are less than
   readable with default font size gitweb uses.
3. Use Character Escape Codes (CEC), using alphabetic and octal
   backslash sequences like those used in C. Probably need to escape
   backslash (quoting character) too. Has the advantage of being widely
   understood in POSIX world. Has the disadvantage of need for escape
   sequence table/hash. Has the advantage that it works for all
   characters - simple octal backslash sequence if they have no special
   escape sequence.
4. Control key Sequence (CS), like the one used in esc_html currently,
   replacing control characters by key sequence that produces them,
   for example replacing LF with ^J, CR with ^M, FF with ^L, ESC with
   ^[, TAB with ^I. Has the advantage of being undestodd I think in
   MS-DOS/MS WIndows world. Has the advantage of being used in esc_html.
   Has the advantage that some text editors use this representation.
   Has the disadvantage of need for large key sequence table/hash.
   Has the disadvantage that less common control characters have cryptic
   control key sequences.
5. Percent encoding, also know as URL encoding. Use %<hex> encoding used
   in URL, taken for example from core of esc_url/esc_param subroutine.
   Simple, but does need marking that is escaped. Disadvantage of hardly
   readable.

Which solution do you think it's best? Or perhaps other solution, like 
using Unicode Printable Representation, or Character Escape Codes with 
the exception of LF which would be replaced by &para; (paragraph sign), 
RET by &crarr; and TAB by either &thorn;, &#8614; or &rarr;.

-- 
Jakub Narebski

  reply	other threads:[~2006-11-03 22:33 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-30 18:53 [PATCH 0/n] gitweb: Better quoting and New improved patchset view Jakub Narebski
2006-10-30 18:58 ` [PATCH/RFC 1/n] gitweb: Better git-unquoting and gitweb-quoting of pathnames Jakub Narebski
2006-11-03  8:15   ` Junio C Hamano
2006-11-03 10:59     ` Jakub Narebski
2006-11-03 11:58       ` Junio C Hamano
2006-11-03 12:09         ` Jakub Narebski
2006-10-30 18:59 ` [PATCH 2/n] gitweb: Use '&iquot;' instead of '?' in esc_path Jakub Narebski
2006-10-31  0:34   ` Junio C Hamano
2006-10-31  1:27     ` Junio C Hamano
2006-10-31  9:23       ` Jakub Narebski
2006-11-03 16:19       ` Jakub Narebski
2006-11-03 21:44         ` Junio C Hamano
2006-11-03 22:33           ` Jakub Narebski [this message]
2006-11-03 22:44             ` Junio C Hamano
2006-11-03 22:50               ` Petr Baudis
2006-11-03 23:35                 ` Jakub Narebski
2006-11-04  0:02                 ` Junio C Hamano
2006-11-04 10:31                   ` Petr Baudis
2006-11-06 21:58             ` Jakub Narebski
2006-11-06 22:47               ` Junio C Hamano
2006-11-06 23:16                 ` Jakub Narebski
     [not found]                   ` <7vwt68b0f3.fsf@assigned-by-dhcp.cox.net>
2006-11-07  0:02                     ` Jakub Narebski
2006-11-07 21:53                 ` Jakub Narebski
2006-11-07 22:18                   ` Junio C Hamano
2006-10-30 21:25 ` [PATCH 3/n] gitweb: Use 's' regexp modifier to secure against filenames with LF Jakub Narebski
2006-10-30 21:29 ` [PATCH 4/n] gitweb: Secure against commit-ish/tree-ish with the same name as path Jakub Narebski
2006-10-31 16:53   ` Jakub Narebski
2006-11-01  0:24     ` Junio C Hamano
2006-11-01  0:40       ` Jakub Narebski
2006-11-02  1:01         ` Junio C Hamano
2006-11-02  8:49           ` Jakub Narebski
2006-11-03  6:18             ` Junio C Hamano
2006-11-03  9:35               ` Junio C Hamano
2006-11-03 10:49                 ` Jakub Narebski
2006-10-31 14:22 ` [PATCH 5/n] [take 3] gitweb: New improved patchset view Jakub Narebski
2006-11-03 10:26   ` [PATCH 5/10] " Jakub Narebski
2006-10-31 16:07 ` [PATCH 6/n] gitweb: Remove redundant "blob" links from git_difftree_body Jakub Narebski
2006-11-03  6:41   ` Junio C Hamano
2006-11-03 11:01     ` Jakub Narebski
2006-10-31 16:36 ` [PATCH 7/n] gitweb: Output also empty patches in "commitdiff" view Jakub Narebski
2006-11-03 11:56   ` Jakub Narebski
2006-10-31 16:43 ` [PATCH 8/n] gitweb: Fix two issues with quoted filenames in git_patchset_body Jakub Narebski
2006-11-01 13:33 ` [PATCH 9/n] gitweb: Better support for non-CSS aware web browsers Jakub Narebski
2006-11-01 13:38   ` Petr Baudis
2006-11-01 13:36 ` [PATCH 10/n] gitweb: New improved formatting of chunk header in diff Jakub Narebski
2006-11-01 18:52 ` [PATCH 00/10] gitweb: Better quoting and New improved patchset view Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200611032333.49794.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).