From: Jakub Narebski <jnareb@gmail.com>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/n] gitweb: Use '&iquot;' instead of '?' in esc_path
Date: Fri, 3 Nov 2006 23:33:49 +0100 [thread overview]
Message-ID: <200611032333.49794.jnareb@gmail.com> (raw)
In-Reply-To: <7virhw5hoi.fsf@assigned-by-dhcp.cox.net>
Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
>
>> # quote unsafe characters and escape filename to HTML
>> sub esc_path {
>> my $str = shift;
>> $str = esc_html($str);
>> $str =~ s!([[:cntrl:]])!sprintf('<span
class="cntrl">&#%04d;</span>', 9216+ord($1))!eg;
>> return $str;
>> }
>>
>> with perhaps the following CSS
>>
>> span.cntrl {
>> border: dashed #aaaaaa;
>> border-width: 1px;
>> padding: 0px 2px 0px 2px;
>> margin: 0px 2px 0px 2px;
>> }
>>
>> What do you think of it?
>
> Probably "# quote unsafe characters" is not what it does yet (it
> just quotes controls currently and nothing else), but we have to
> start somewhere and I think this is a good start.
Well, control characters (at least some of them) are not correct
characters in UTF-8 HTML output; Mozilla in strict XHTML mode complains.
Currently for example esc_html escapes FORM FEED (FF) and ESCAPE (ESC)
characters, because they happened to be present in git.git repository
(in COPYING file and in commit v1.4.2.1-g20a3847 respectively).
As I see it, we can either replace non-safe characters (control
characters) by single characters a la --hide-control-chars: that
is minimal solution, or we can quote unseafe characters somewhat,
but if we do that we have to indicate that we quote. Git core and
ls encloses material which needs escaping with quotes; in gitweb
it is somewhat impractical; besides we have more possibilities
to mark fragment of text (span element encompassing representation
of escaped characters for example).
I have thought of the following escaping:
1. Hide control characters using '?' or other similar character like
ċ for example
2. Use "Unicode" quoting, i.e. replace control characters by their
Unicode Printable Representation (PR), as shown above. Has the
advantage that it is simple and does not need theoretically marking
that it is quoted; has the disadvantage that browser must support
this part of Unicode, and that those characters are less than
readable with default font size gitweb uses.
3. Use Character Escape Codes (CEC), using alphabetic and octal
backslash sequences like those used in C. Probably need to escape
backslash (quoting character) too. Has the advantage of being widely
understood in POSIX world. Has the disadvantage of need for escape
sequence table/hash. Has the advantage that it works for all
characters - simple octal backslash sequence if they have no special
escape sequence.
4. Control key Sequence (CS), like the one used in esc_html currently,
replacing control characters by key sequence that produces them,
for example replacing LF with ^J, CR with ^M, FF with ^L, ESC with
^[, TAB with ^I. Has the advantage of being undestodd I think in
MS-DOS/MS WIndows world. Has the advantage of being used in esc_html.
Has the advantage that some text editors use this representation.
Has the disadvantage of need for large key sequence table/hash.
Has the disadvantage that less common control characters have cryptic
control key sequences.
5. Percent encoding, also know as URL encoding. Use %<hex> encoding used
in URL, taken for example from core of esc_url/esc_param subroutine.
Simple, but does need marking that is escaped. Disadvantage of hardly
readable.
Which solution do you think it's best? Or perhaps other solution, like
using Unicode Printable Representation, or Character Escape Codes with
the exception of LF which would be replaced by ¶ (paragraph sign),
RET by ↵ and TAB by either þ, ↦ or →.
--
Jakub Narebski
next prev parent reply other threads:[~2006-11-03 22:33 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-30 18:53 [PATCH 0/n] gitweb: Better quoting and New improved patchset view Jakub Narebski
2006-10-30 18:58 ` [PATCH/RFC 1/n] gitweb: Better git-unquoting and gitweb-quoting of pathnames Jakub Narebski
2006-11-03 8:15 ` Junio C Hamano
2006-11-03 10:59 ` Jakub Narebski
2006-11-03 11:58 ` Junio C Hamano
2006-11-03 12:09 ` Jakub Narebski
2006-10-30 18:59 ` [PATCH 2/n] gitweb: Use '&iquot;' instead of '?' in esc_path Jakub Narebski
2006-10-31 0:34 ` Junio C Hamano
2006-10-31 1:27 ` Junio C Hamano
2006-10-31 9:23 ` Jakub Narebski
2006-11-03 16:19 ` Jakub Narebski
2006-11-03 21:44 ` Junio C Hamano
2006-11-03 22:33 ` Jakub Narebski [this message]
2006-11-03 22:44 ` Junio C Hamano
2006-11-03 22:50 ` Petr Baudis
2006-11-03 23:35 ` Jakub Narebski
2006-11-04 0:02 ` Junio C Hamano
2006-11-04 10:31 ` Petr Baudis
2006-11-06 21:58 ` Jakub Narebski
2006-11-06 22:47 ` Junio C Hamano
2006-11-06 23:16 ` Jakub Narebski
[not found] ` <7vwt68b0f3.fsf@assigned-by-dhcp.cox.net>
2006-11-07 0:02 ` Jakub Narebski
2006-11-07 21:53 ` Jakub Narebski
2006-11-07 22:18 ` Junio C Hamano
2006-10-30 21:25 ` [PATCH 3/n] gitweb: Use 's' regexp modifier to secure against filenames with LF Jakub Narebski
2006-10-30 21:29 ` [PATCH 4/n] gitweb: Secure against commit-ish/tree-ish with the same name as path Jakub Narebski
2006-10-31 16:53 ` Jakub Narebski
2006-11-01 0:24 ` Junio C Hamano
2006-11-01 0:40 ` Jakub Narebski
2006-11-02 1:01 ` Junio C Hamano
2006-11-02 8:49 ` Jakub Narebski
2006-11-03 6:18 ` Junio C Hamano
2006-11-03 9:35 ` Junio C Hamano
2006-11-03 10:49 ` Jakub Narebski
2006-10-31 14:22 ` [PATCH 5/n] [take 3] gitweb: New improved patchset view Jakub Narebski
2006-11-03 10:26 ` [PATCH 5/10] " Jakub Narebski
2006-10-31 16:07 ` [PATCH 6/n] gitweb: Remove redundant "blob" links from git_difftree_body Jakub Narebski
2006-11-03 6:41 ` Junio C Hamano
2006-11-03 11:01 ` Jakub Narebski
2006-10-31 16:36 ` [PATCH 7/n] gitweb: Output also empty patches in "commitdiff" view Jakub Narebski
2006-11-03 11:56 ` Jakub Narebski
2006-10-31 16:43 ` [PATCH 8/n] gitweb: Fix two issues with quoted filenames in git_patchset_body Jakub Narebski
2006-11-01 13:33 ` [PATCH 9/n] gitweb: Better support for non-CSS aware web browsers Jakub Narebski
2006-11-01 13:38 ` Petr Baudis
2006-11-01 13:36 ` [PATCH 10/n] gitweb: New improved formatting of chunk header in diff Jakub Narebski
2006-11-01 18:52 ` [PATCH 00/10] gitweb: Better quoting and New improved patchset view Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200611032333.49794.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).