git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] gitweb: protect blob and diff output lines from controls.
@ 2006-11-08 23:34 Junio C Hamano
  2006-11-09  0:04 ` Jakub Narebski
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2006-11-08 23:34 UTC (permalink / raw)
  To: Jakub Narebski, Petr Baudis; +Cc: git

This reuses the quot_cec to protect blob and text diff output
from leaking control characters.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 * requesting extra sets of eyeballs.

 gitweb/gitweb.perl |    9 ++++-----
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index f46d678..b5b1011 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -597,11 +597,10 @@ sub esc_html ($;%) {
 
 	$str = to_utf8($str);
 	$str = escapeHTML($str);
-	$str =~ s/\014/^L/g; # escape FORM FEED (FF) character (e.g. in COPYING file)
-	$str =~ s/\033/^[/g; # "escape" ESCAPE (\e) character (e.g. commit 20a3847d8a5032ce41f90dcc68abfb36e6fee9b1)
 	if ($opts{'-nbsp'}) {
 		$str =~ s/ /&nbsp;/g;
 	}
+	$str =~ s|([[:cntrl:]])|(($1 ne "\t") ? quot_cec($1) : $1)|eg;
 	return $str;
 }
 
@@ -1900,17 +1899,17 @@ sub git_print_page_path {
 			$fullname .= ($fullname ? '/' : '') . $dir;
 			print $cgi->a({-href => href(action=>"tree", file_name=>$fullname,
 			                             hash_base=>$hb),
-			              -title => $fullname}, esc_path($dir));
+			              -title => esc_html($fullname)}, esc_path($dir));
 			print " / ";
 		}
 		if (defined $type && $type eq 'blob') {
 			print $cgi->a({-href => href(action=>"blob_plain", file_name=>$file_name,
 			                             hash_base=>$hb),
-			              -title => $name}, esc_path($basename));
+			              -title => esc_html($name)}, esc_path($basename));
 		} elsif (defined $type && $type eq 'tree') {
 			print $cgi->a({-href => href(action=>"tree", file_name=>$file_name,
 			                             hash_base=>$hb),
-			              -title => $name}, esc_path($basename));
+			              -title => esc_html($name)}, esc_path($basename));
 			print " / ";
 		} else {
 			print esc_path($basename);
-- 
1.4.4.rc1.g659d


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] gitweb: protect blob and diff output lines from controls.
  2006-11-08 23:34 [PATCH] gitweb: protect blob and diff output lines from controls Junio C Hamano
@ 2006-11-09  0:04 ` Jakub Narebski
  2006-11-09  0:15   ` Junio C Hamano
  2006-11-09  9:24   ` Jakub Narebski
  0 siblings, 2 replies; 12+ messages in thread
From: Jakub Narebski @ 2006-11-09  0:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> This reuses the quot_cec to protect blob and text diff output
> from leaking control characters.
> 
> Signed-off-by: Junio C Hamano <junkio@cox.net>
> ---
>  * requesting extra sets of eyeballs.

This changes the "blob" and "text diff" output somewhat, as earlier it 
used Control key Sequence (CS) representation for some non-whitespace 
control characters (not "\t' not '\n'), namely replacing form feed (FF) 
('\f', '\014') with ^L and escape (ESC) ('\e', '\033') with ^[.

And (what is not said in the commit message) it additionally esc_html 
some title elements (the subroutine should be I think named esc_attr).

The problems are:
1. First, esc_path should _not_ use subroutine which does it's own 
contol characters escaping. That was also a mistake I made in my patch.
Perhaps we should have some quot_html or to_html subroutine which does 
_only_ to_utf8 (decode from Encode module), escapeHTML and optionally 
s/ /&nbsp;/g conversion.

2. In my opinion CS is better than CEC for quoting/escaping control 
characters in the "bulk" output, namely "blob" output and "text 
diff" (patchset body) output. CEC is better for pathnames (which must 
fit in one line), and perhaps other one-liners; perhaps not. I'm not 
sure what quoting to choose for esc_attr, but there we could use even 
--no-control-chars quoting (replacing any control character by '?'); 
but perhaps in some cases like git_print_page_path subroutine CEC is 
better.

BTW. what had happened with to_qtext post?
-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] gitweb: protect blob and diff output lines from controls.
  2006-11-09  0:04 ` Jakub Narebski
@ 2006-11-09  0:15   ` Junio C Hamano
  2006-11-09  0:46     ` Jakub Narebski
  2006-11-09  9:24   ` Jakub Narebski
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2006-11-09  0:15 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> 1. First, esc_path should _not_ use subroutine which does it's own 
> contol characters escaping. That was also a mistake I made in my patch.
> Perhaps we should have some quot_html or to_html subroutine which does 
> _only_ to_utf8 (decode from Encode module), escapeHTML and optionally 
> s/ /&nbsp;/g conversion.

I hated that original arrangement, but I do not see anything
obviously wrong in the output with the patch you are responding
to.  Except that git_blame2 is missing a chomp() on "my $data"
after finishing the metainfo loop, that is.

> 2. In my opinion CS is better than CEC for quoting/escaping control 
> characters in the "bulk" output, namely "blob" output and "text 
> diff" (patchset body) output. CEC is better for pathnames (which must 
> fit in one line), and perhaps other one-liners; perhaps not.

I am more for code reuse and consistency.  If "^L" is more
readable then we should consistently use it for both contents
and pathnames.  One of my tests were a symlink that points at a
funny filename ;-).

> BTW. what had happened with to_qtext post?

Sorry, I don't recall.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] gitweb: protect blob and diff output lines from controls.
  2006-11-09  0:15   ` Junio C Hamano
@ 2006-11-09  0:46     ` Jakub Narebski
  2006-11-09  1:10       ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Narebski @ 2006-11-09  0:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
>> 1. First, esc_path should _not_ use subroutine which does it's own 
>> contol characters escaping. That was also a mistake I made in my patch.
>> Perhaps we should have some quot_html or to_html subroutine which does 
>> _only_ to_utf8 (decode from Encode module), escapeHTML and optionally 
>> s/ /&nbsp;/g conversion.
> 
> I hated that original arrangement, 

What did you hate, again?

>                                    but I do not see anything 
> obviously wrong in the output with the patch you are responding
> to.  Except that git_blame2 is missing a chomp() on "my $data"
> after finishing the metainfo loop, that is.

The original (mine) code for esc_path uses esc_html, which did it's
own partial (very partial) special characters esaping, namely
\014 (\f) => ^L, \033 (\e) => ^[. So if pathname had form feed character,
it would be replaced by ^L, not '\f'.

You have added quot_cec to esc_html subroutine directly. I don't know
what is your version of esc_html after the changes you made, but
this makes escaping part (quot) in esc_path never invoked. 

>> 2. In my opinion CS is better than CEC for quoting/escaping control 
>> characters in the "bulk" output, namely "blob" output and "text 
>> diff" (patchset body) output. CEC is better for pathnames (which must 
>> fit in one line), and perhaps other one-liners; perhaps not.
> 
> I am more for code reuse and consistency.  If "^L" is more
> readable then we should consistently use it for both contents
> and pathnames.  

Well, the pathname has the limit that it must be in single line
after quoting. The "blob" output is multipage. IMHO CEC like \n, \f,
\t are better in pathnames because this is what ls uses, while CS
for "blob" output is better because editors (including one true
editor being GNU Emacs ;-) uses CS like ^L (there is no end-of-line
as we split on LF and chomp; there is no tab character because line
is untabified first). But that is my opinion.

I think that conrol characters in filenames (in esc_path) should
be encompassed with <span class="cntrl">...</span> and styled.
I'm not sure if in "blob" view they should be styled. For sure
there should be no <span>...</span> for escaped attributed (future
esc_attr). Common to_html/quot_html would give us code reuse (as gives
quot_cec), if not consistent.

>                One of my tests were a symlink that points at a 
> funny filename ;-).

This should be IMHO solved rather by better "tree" view support
for symlinks, 'symlink' -> 'target' like in ls -l output.

>> BTW. what had happened with to_qtext post?
> 
> Sorry, I don't recall.

There was quite a bit of discussion about name of _suggested_
filename in blob_plain, blobdiff_plain view, namely the 
  -content_disposition => 'inline; filename="' ...
HTTP header. The result (probably lost in the noise) was to
add to_qtext subroutine for that.

Time to go to sleep...
-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] gitweb: protect blob and diff output lines from controls.
  2006-11-09  0:46     ` Jakub Narebski
@ 2006-11-09  1:10       ` Junio C Hamano
  2006-11-09  9:34         ` Jakub Narebski
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2006-11-09  1:10 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> Junio C Hamano wrote:
>> Jakub Narebski <jnareb@gmail.com> writes:
>> 
>>> 1. First, esc_path should _not_ use subroutine which does it's own 
>>> contol characters escaping. That was also a mistake I made in my patch.
>>> Perhaps we should have some quot_html or to_html subroutine which does 
>>> _only_ to_utf8 (decode from Encode module), escapeHTML and optionally 
>>> s/ /&nbsp;/g conversion.
>> 
>> I hated that original arrangement, 
>
> What did you hate, again?

esc_path calling esc_html you mentioned, of course.


>> obviously wrong in the output with the patch you are responding
>> to.  Except that git_blame2 is missing a chomp() on "my $data"
>> after finishing the metainfo loop, that is.
>
> The original (mine) code for esc_path uses esc_html, which did it's
> own partial (very partial) special characters esaping, namely
> \014 (\f) => ^L, \033 (\e) => ^[. So if pathname had form feed character,
> it would be replaced by ^L, not '\f'.

I know -- that is what I meant by "code reuse and consistency".

> You have added quot_cec to esc_html subroutine directly. I don't know
> what is your version of esc_html after the changes you
> made,...

See "pu".

> Well, the pathname has the limit that it must be in single line
> after quoting. The "blob" output is multipage.

I honestly have _no_ idea what distincition you are seeing
here.  Both blob and diff output are processed one line at a
time and its result would be on a single line too.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] gitweb: protect blob and diff output lines from controls.
  2006-11-09  0:04 ` Jakub Narebski
  2006-11-09  0:15   ` Junio C Hamano
@ 2006-11-09  9:24   ` Jakub Narebski
  2006-11-09  9:55     ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Jakub Narebski @ 2006-11-09  9:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Jakub Narebski wrote:
> I'm not sure what quoting to choose for esc_attr, but there we could
> use even --no-control-chars quoting (replacing any control character
> by '?');  but perhaps in some cases like git_print_page_path
> subroutine CEC is better.

I'm rambling. esc_attr is special case, because CGI does escapeHTML
(and I hope also to_utf8) for us. Using <span class="cntrl">...</span>
has also no sense. So there should be separate esc_attr_path subroutine
I think.

Even if we decide that esc_html and esc_path should give identical
output (the difference that _might_ be here is that in esc_html we
don't need to escape whitespace control characters valid in HTML,
like tab (HT, TAB) or newline (LF); on the other hand thanks to
line-by-line processing we should never get newline in "blob", and
thanks to untabify we should never get tab in "blob") I think it would
be prudent to have esc_path, even as thin wrapper just caling esc_html.

We might decide to use different style for control characters in
different views, but that I think can be done using pure CSS.
-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] gitweb: protect blob and diff output lines from controls.
  2006-11-09  1:10       ` Junio C Hamano
@ 2006-11-09  9:34         ` Jakub Narebski
  0 siblings, 0 replies; 12+ messages in thread
From: Jakub Narebski @ 2006-11-09  9:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Dnia czwartek 9. listopada 2006 02:10, Junio C Hamano napisał:
> Jakub Narebski <jnareb@gmail.com> writes:

>> Well, the pathname has the limit that it must be in single line
>> after quoting. The "blob" output is multipage.
> 
> I honestly have _no_ idea what distincition you are seeing
> here.  Both blob and diff output are processed one line at a
> time and its result would be on a single line too.

I was thinking about _conceptual_ difference, not technical.
(Perhaps I should make it more clear.) 

Pathname is item (or part of item in the case of page_path)
which is contained (and must be contained) in single line.
It is also expected (although if we follow this expectation
is up to us) that the pathname would quote special characters
similarly to how shell/operating system quotes pathnames
(e.g. in ls output).

"Blob" output on the other hand ("blob" view and patch part of
"blobdiff" and "commitdiff" views) is [a part of] larger, multiline
whole. One could also expect that special characters would be
quoted like editor quotes special characters. (Of course question
is: which editor?)

This of course is complicate by single line output like subject
or authorship, or signoff, which is not pathname.


All this discussion shows that gitweb quoting is more complicated
that I thought.
-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] gitweb: protect blob and diff output lines from controls.
  2006-11-09  9:24   ` Jakub Narebski
@ 2006-11-09  9:55     ` Junio C Hamano
  2006-11-09 10:02       ` Jakub Narebski
  2006-11-10 10:22       ` Luben Tuikov
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2006-11-09  9:55 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, Luben Tuikov

Jakub Narebski <jnareb@gmail.com> writes:

> Jakub Narebski wrote:
>> I'm not sure what quoting to choose for esc_attr, but there we could
>> use even --no-control-chars quoting (replacing any control character
>> by '?');  but perhaps in some cases like git_print_page_path
>> subroutine CEC is better.

To be honest, I do not have strong preference between the
escaping style.  If the gitweb cabal feel it is more natural to
see "^L" in blobs and "\f" in path, I will very happily accept
such a patch.

> I'm rambling. esc_attr is special case, because CGI does escapeHTML
> (and I hope also to_utf8) for us. Using <span class="cntrl">...</span>
> has also no sense. So there should be separate esc_attr_path subroutine
> I think.

Yes.  It is unfortunate that there needs different types of
quoting.  I think the first step would be to stop calling
esc_html in esc_path.  I think it was a mistake, and I did not
correct it when I started touching it.

Somehow I ended up spending sizeable part of my git day this
week on fixing up blob/blame/tag/commit message view regarding
this "make controls visible and safe" issues on the 'master'
branch, but I have been consciously staying out of gitweb/ part
of the system, primarily because there are many other people who
are more interested and qualified in it than myself.

I'll step aside and try not to get in the way.  There is another
thing I noticed while testing it with an artifitial test that I
haven't fixed, but I think you already know about it (when the
commitdiff is completely empty except mode changes, we end up
with unbalanced div).  My test's tip can be found at
'gitweb-test-funny-char' branch temporarily in the git.git
repository.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] gitweb: protect blob and diff output lines from controls.
  2006-11-09  9:55     ` Junio C Hamano
@ 2006-11-09 10:02       ` Jakub Narebski
  2006-11-09 10:34         ` Junio C Hamano
  2006-11-10 10:22       ` Luben Tuikov
  1 sibling, 1 reply; 12+ messages in thread
From: Jakub Narebski @ 2006-11-09 10:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> [...]  There is another
> thing I noticed while testing it with an artifitial test that I
> haven't fixed, but I think you already know about it (when the
> commitdiff is completely empty except mode changes, we end up
> with unbalanced div).  My test's tip can be found at
> 'gitweb-test-funny-char' branch temporarily in the git.git
> repository.

Damn. I thought I corrected this on resend...

	if ($patch_found) {
		# close extended header for previous empty patch
		if ($in_header) {
			print "</div>\n" # class="diff extended_header"
		}
		# close previous patch
		print "</div>\n"; # class="patch"
	}
-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] gitweb: protect blob and diff output lines from controls.
  2006-11-09 10:02       ` Jakub Narebski
@ 2006-11-09 10:34         ` Junio C Hamano
  2006-11-09 10:41           ` Jakub Narebski
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2006-11-09 10:34 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> Junio C Hamano wrote:
>> [...]  There is another
>> thing I noticed while testing it with an artifitial test that I
>> haven't fixed, but I think you already know about it (when the
>> commitdiff is completely empty except mode changes, we end up
>> with unbalanced div).  My test's tip can be found at
>> 'gitweb-test-funny-char' branch temporarily in the git.git
>> repository.
>
> Damn. I thought I corrected this on resend...

I think you need this, otherwise when the last filepair changes
only metainfo you fail to close the extended header div.

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 1a757cc..e54a29e 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2338,6 +2338,8 @@ sub git_patchset_body {
 
 		print format_diff_line($patch_line);
 	}
+	print "</div>\n" if $in_header; # extended header
+
 	print "</div>\n" if $patch_found; # class="patch"
 
 	print "</div>\n"; # class="patchset"

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] gitweb: protect blob and diff output lines from controls.
  2006-11-09 10:34         ` Junio C Hamano
@ 2006-11-09 10:41           ` Jakub Narebski
  0 siblings, 0 replies; 12+ messages in thread
From: Jakub Narebski @ 2006-11-09 10:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
>> Junio C Hamano wrote:
>>> [...]  There is another
>>> thing I noticed while testing it with an artifitial test that I
>>> haven't fixed, but I think you already know about it (when the
>>> commitdiff is completely empty except mode changes, we end up
>>> with unbalanced div).  My test's tip can be found at
>>> 'gitweb-test-funny-char' branch temporarily in the git.git
>>> repository.
>>
>> Damn. I thought I corrected this on resend...
> 
> I think you need this, otherwise when the last filepair changes
> only metainfo you fail to close the extended header div.

True, I forgot about the situation where empty patch is _last_
in the patchset (which includes the situation when it is _only_
patch).

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 1a757cc..e54a29e 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2338,6 +2338,8 @@ sub git_patchset_body {
>  
>  		print format_diff_line($patch_line);
>  	}
> +	print "</div>\n" if $in_header; # extended header
I would write '# class="diff extended_header"' here instead.
> +
>  	print "</div>\n" if $patch_found; # class="patch"
>  
>  	print "</div>\n"; # class="patchset"

Looks good. Ack.

-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] gitweb: protect blob and diff output lines from controls.
  2006-11-09  9:55     ` Junio C Hamano
  2006-11-09 10:02       ` Jakub Narebski
@ 2006-11-10 10:22       ` Luben Tuikov
  1 sibling, 0 replies; 12+ messages in thread
From: Luben Tuikov @ 2006-11-10 10:22 UTC (permalink / raw)
  To: Junio C Hamano, Jakub Narebski; +Cc: git, Petr Baudis, Luben Tuikov

--- Junio C Hamano <junkio@cox.net> wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
> > Jakub Narebski wrote:
> >> I'm not sure what quoting to choose for esc_attr, but there we could
> >> use even --no-control-chars quoting (replacing any control character
> >> by '?');  but perhaps in some cases like git_print_page_path
> >> subroutine CEC is better.
> 
> To be honest, I do not have strong preference between the
> escaping style.  If the gitweb cabal feel it is more natural to
> see "^L" in blobs and "\f" in path, I will very happily accept
> such a patch.

I've little preference either, as long as the intention
of the original name is preserved across gitweb (to a user's
git-repo/download).

> Yes.  It is unfortunate that there needs different types of
> quoting.  I think the first step would be to stop calling
> esc_html in esc_path.  I think it was a mistake, and I did not
> correct it when I started touching it.

When Jakub mentioned "to_qtext" he meant this patch:
http://marc.theaimsgroup.com/?l=git&m=116016249121781&w=2

   Luben


> Somehow I ended up spending sizeable part of my git day this
> week on fixing up blob/blame/tag/commit message view regarding
> this "make controls visible and safe" issues on the 'master'
> branch, but I have been consciously staying out of gitweb/ part
> of the system, primarily because there are many other people who
> are more interested and qualified in it than myself.
> 
> I'll step aside and try not to get in the way.  There is another
> thing I noticed while testing it with an artifitial test that I
> haven't fixed, but I think you already know about it (when the
> commitdiff is completely empty except mode changes, we end up
> with unbalanced div).  My test's tip can be found at
> 'gitweb-test-funny-char' branch temporarily in the git.git
> repository.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-11-10 10:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-08 23:34 [PATCH] gitweb: protect blob and diff output lines from controls Junio C Hamano
2006-11-09  0:04 ` Jakub Narebski
2006-11-09  0:15   ` Junio C Hamano
2006-11-09  0:46     ` Jakub Narebski
2006-11-09  1:10       ` Junio C Hamano
2006-11-09  9:34         ` Jakub Narebski
2006-11-09  9:24   ` Jakub Narebski
2006-11-09  9:55     ` Junio C Hamano
2006-11-09 10:02       ` Jakub Narebski
2006-11-09 10:34         ` Junio C Hamano
2006-11-09 10:41           ` Jakub Narebski
2006-11-10 10:22       ` Luben Tuikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).