* [PATCH] gitweb: filter escapes from longer commit titles that break firefox @ 2009-04-17 16:24 Paul Gortmaker 2009-04-20 9:32 ` Jakub Narebski 0 siblings, 1 reply; 7+ messages in thread From: Paul Gortmaker @ 2009-04-17 16:24 UTC (permalink / raw) To: git If there is a commit that ends in ^X and is longer in length than what will fit in title_short, then it doesn't get fed through esc_html() and so the ^X will appear as-is in the page source. When Firefox comes across this, it will fail to display the page, and only display a couple lines of error messages that read like: XML Parsing Error: not well-formed Location: http://git .... Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> --- gitweb/gitweb.perl | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 33ef190..e686e82 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -2470,7 +2470,7 @@ sub parse_commit_text { foreach my $title (@commit_lines) { $title =~ s/^ //; if ($title ne "") { - $co{'title'} = chop_str($title, 80, 5); + $co{'title'} = chop_and_escape_str($title, 80, 5); # remove leading stuff of merges to make the interesting part visible if (length($title) > 50) { $title =~ s/^Automatic //; -- 1.6.2.3 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] gitweb: filter escapes from longer commit titles that break firefox 2009-04-17 16:24 [PATCH] gitweb: filter escapes from longer commit titles that break firefox Paul Gortmaker @ 2009-04-20 9:32 ` Jakub Narebski 2009-04-20 13:29 ` Paul Gortmaker 0 siblings, 1 reply; 7+ messages in thread From: Jakub Narebski @ 2009-04-20 9:32 UTC (permalink / raw) To: Paul Gortmaker; +Cc: git Paul Gortmaker <paul.gortmaker@windriver.com> writes: > If there is a commit that ends in ^X and is longer in length than > what will fit in title_short, then it doesn't get fed through > esc_html() and so the ^X will appear as-is in the page source. > > When Firefox comes across this, it will fail to display the page, > and only display a couple lines of error messages that read like: > > XML Parsing Error: not well-formed > Location: http://git .... > > Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> This is an issue for when project doesn't follow sanity (control characters in commit message) nor commit message conventions of git (limiting length of first line of commit message to 60-70 characters). But I do not think that the solution presented here is good solution for this problem. chop_and_escape_str is meant as _output_ filter, because it generates (can generate) fragment of HTML. It is not a good solution to use it for shortening in intermediate representation of %co{'title'}. And I think that issue might be a bug elsewhere in gitweb if we have text output which is not passed through esc_html... or bug in CGI.pm if the error is in not escaping of -title _attribute_ (attribute escaping has slightly different rules than escaping HTML, and should be done automatically by CGI.pm). So thanks for noticing the issue, but NAK on the solution. > --- > gitweb/gitweb.perl | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 33ef190..e686e82 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -2470,7 +2470,7 @@ sub parse_commit_text { > foreach my $title (@commit_lines) { > $title =~ s/^ //; > if ($title ne "") { > - $co{'title'} = chop_str($title, 80, 5); > + $co{'title'} = chop_and_escape_str($title, 80, 5); > # remove leading stuff of merges to make the interesting part visible > if (length($title) > 50) { > $title =~ s/^Automatic //; > -- > 1.6.2.3 > -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] gitweb: filter escapes from longer commit titles that break firefox 2009-04-20 9:32 ` Jakub Narebski @ 2009-04-20 13:29 ` Paul Gortmaker 2009-04-24 17:53 ` Jakub Narebski 0 siblings, 1 reply; 7+ messages in thread From: Paul Gortmaker @ 2009-04-20 13:29 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski wrote: > Paul Gortmaker <paul.gortmaker@windriver.com> writes: > > >> If there is a commit that ends in ^X and is longer in length than >> what will fit in title_short, then it doesn't get fed through >> esc_html() and so the ^X will appear as-is in the page source. >> >> When Firefox comes across this, it will fail to display the page, >> and only display a couple lines of error messages that read like: >> >> XML Parsing Error: not well-formed >> Location: http://git .... >> >> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> >> > > This is an issue for when project doesn't follow sanity (control > characters in commit message) nor commit message conventions of git > (limiting length of first line of commit message to 60-70 characters). > I agree - the situation should be that it doesn't happen, but it can happen (and it did happen) that a novice, or a simple mistake ends up with such a commit. > But I do not think that the solution presented here is good solution > for this problem. chop_and_escape_str is meant as _output_ filter, > because it generates (can generate) fragment of HTML. It is not a > good solution to use it for shortening in intermediate representation > of %co{'title'}. > > And I think that issue might be a bug elsewhere in gitweb if we have > text output which is not passed through esc_html... or bug in CGI.pm > if the error is in not escaping of -title _attribute_ (attribute > escaping has slightly different rules than escaping HTML, and should > be done automatically by CGI.pm). > > > So thanks for noticing the issue, but NAK on the solution. > Fair enough -- I wasn't familiar with the code in there, and there wasn't really any indication that it was for output only. I can easily believe that there is a better place for it -- I just didn't see where any global esc_html filtering was taking place... Paul. > >> --- >> gitweb/gitweb.perl | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl >> index 33ef190..e686e82 100755 >> --- a/gitweb/gitweb.perl >> +++ b/gitweb/gitweb.perl >> @@ -2470,7 +2470,7 @@ sub parse_commit_text { >> foreach my $title (@commit_lines) { >> $title =~ s/^ //; >> if ($title ne "") { >> - $co{'title'} = chop_str($title, 80, 5); >> + $co{'title'} = chop_and_escape_str($title, 80, 5); >> # remove leading stuff of merges to make the interesting part visible >> if (length($title) > 50) { >> $title =~ s/^Automatic //; >> -- >> 1.6.2.3 >> >> > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] gitweb: filter escapes from longer commit titles that break firefox 2009-04-20 13:29 ` Paul Gortmaker @ 2009-04-24 17:53 ` Jakub Narebski 2009-04-24 19:48 ` Paul Gortmaker 0 siblings, 1 reply; 7+ messages in thread From: Jakub Narebski @ 2009-04-24 17:53 UTC (permalink / raw) To: Paul Gortmaker; +Cc: git On Mon, 20 April 2009, Paul Gortmaker wrote: > Jakub Narebski wrote: >> Paul Gortmaker <paul.gortmaker@windriver.com> writes: >> >> >>> If there is a commit that ends in ^X and is longer in length than >>> what will fit in title_short, then it doesn't get fed through >>> esc_html() and so the ^X will appear as-is in the page source. >>> >>> When Firefox comes across this, it will fail to display the page, >>> and only display a couple lines of error messages that read like: >>> >>> XML Parsing Error: not well-formed >>> Location: http://git .... >>> >>> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> >> But I do not think that the solution presented here is good solution >> for this problem. chop_and_escape_str is meant as _output_ filter, >> because it generates (can generate) fragment of HTML. It is not a >> good solution to use it for shortening in intermediate representation >> of %co{'title'}. >> >> And I think that issue might be a bug elsewhere in gitweb if we have >> text output which is not passed through esc_html... or bug in CGI.pm >> if the error is in not escaping of -title _attribute_ (attribute >> escaping has slightly different rules than escaping HTML, and should >> be done automatically by CGI.pm). >> >> >> So thanks for noticing the issue, but NAK on the solution. > > Fair enough -- I wasn't familiar with the code in there, and there > wasn't really any indication that it was for output only. I can easily > believe that there is a better place for it -- I just didn't see where > any global esc_html filtering was taking place... The name chop_and_escape_str for this subroutine is not a very good name; it rather should follow format_* as a naming convention for this subroutine. What more important is: can you find out in more detail _where_ an error (unescaped control character) occurs: is it tag contents or 'title' attribute for some tag, what tag is it (name and class), in what view or views this bug is present, and in which part this occur? Without those details it would b much harder to diagnose this bug... -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] gitweb: filter escapes from longer commit titles that break firefox 2009-04-24 17:53 ` Jakub Narebski @ 2009-04-24 19:48 ` Paul Gortmaker 2009-04-24 22:10 ` Jakub Narebski 0 siblings, 1 reply; 7+ messages in thread From: Paul Gortmaker @ 2009-04-24 19:48 UTC (permalink / raw) To: Jakub Narebski; +Cc: git [Re: [PATCH] gitweb: filter escapes from longer commit titles that break firefox] On 24/04/2009 (Fri 19:53) Jakub Narebski wrote: > On Mon, 20 April 2009, Paul Gortmaker wrote: > > Jakub Narebski wrote: > >> Paul Gortmaker <paul.gortmaker@windriver.com> writes: > >> > >> > >>> If there is a commit that ends in ^X and is longer in length than > >>> what will fit in title_short, then it doesn't get fed through > >>> esc_html() and so the ^X will appear as-is in the page source. > >>> > >>> When Firefox comes across this, it will fail to display the page, > >>> and only display a couple lines of error messages that read like: > >>> > >>> XML Parsing Error: not well-formed > >>> Location: http://git .... > >>> > >>> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> > > >> But I do not think that the solution presented here is good solution > >> for this problem. chop_and_escape_str is meant as _output_ filter, > >> because it generates (can generate) fragment of HTML. It is not a > >> good solution to use it for shortening in intermediate representation > >> of %co{'title'}. > >> > >> And I think that issue might be a bug elsewhere in gitweb if we have > >> text output which is not passed through esc_html... or bug in CGI.pm > >> if the error is in not escaping of -title _attribute_ (attribute > >> escaping has slightly different rules than escaping HTML, and should > >> be done automatically by CGI.pm). > >> > >> > >> So thanks for noticing the issue, but NAK on the solution. > > > > Fair enough -- I wasn't familiar with the code in there, and there > > wasn't really any indication that it was for output only. I can easily > > believe that there is a better place for it -- I just didn't see where > > any global esc_html filtering was taking place... > > The name chop_and_escape_str for this subroutine is not a very good > name; it rather should follow format_* as a naming convention for this > subroutine. > > What more important is: can you find out in more detail _where_ > an error (unescaped control character) occurs: is it tag contents or > 'title' attribute for some tag, what tag is it (name and class), in > what view or views this bug is present, and in which part this occur? > Without those details it would b much harder to diagnose this bug... No problem -- It appears to be in the title attribute, and it appears straight away when I go to the toplevel view of the repo, assuming that the commit is within the top 10 recent commits that are shown on the summary page. I've put more details below on how I can reproduce it and the page source deltas -- hopefully this will help. If there is something else I can provide that would help, don't hesitate to ask. Paul. ------- Setup: yow-d4:test$mkdir bad_commit yow-d4:test$cd bad_commit/ yow-d4:bad_commit$git init Initialized empty Git repository in /home/pgortmak/test/bad_commit/.git/ yow-d4:bad_commit$echo bbb > bbb yow-d4:bad_commit$git add bbb yow-d4:bad_commit$git commit -m 'some string that is longer than roughly 50chars, with a ^X embedded at the end^X' [master (root-commit) 8735814] some string that is longer than roughly 50chars, with a ^X embedded at the end 1 files changed, 1 insertions(+), 0 deletions(-) create mode 100644 bbb yow-d4:bad_commit$ -------- I've used ^V^X to embed the ^X at the end of the commit message above; the other "^X" is just literally a ^ followed by a X. Then I load it with firefox (default shipping with Ubuntu Jaunty). With my workaround patch sent previously, it renders OK, and I save the source to "source-ok". Then I take out my hack patch, and it fails to render, instead giving: ------ XML Parsing Error: not well-formed Location: http://yow-somehost.com/gitweb/gitweb.cgi?p=local/pgortmak/test/bad_commit/.git;a=summary Line Number 54, Column 114:<td><a class="list subject" title="some string that is longer than roughly 50chars, with a ^X embedded at the end" href="/gitweb/gitweb.cgi?p=local/pgortmak/test/bad_commit/.git;a=commit;h=8735814a15cf930c48fd33563f5922a103b6b4ea">some string that is longer than roughly 50chars, with... <span class="refs"> <span class="head" title="heads/master"><a href="/gitweb/gitweb.cgi?p=local/pgortmak/test/bad_commit/.git;a=shortlog;h=refs/heads/master">master</a></span></span></a></td> -----------------------------------------------------------------------------------------------------------------^ You probably can't tell in mail, but the line of dashes and the caret are pointing at firefox's rendering of the ^X at EOL, which is displayed as a little square box with a 00 above an 18 inside it. If I save this page to source-bad, and then diff the two, I get: --- source-ok 2009-04-24 15:29:11.000000000 -0400 +++ source-bad 2009-04-24 15:30:20.000000000 -0400 @@ -54,9 +54,9 @@ </div> <table class="shortlog"> <tr class="dark"> -<td title="2009-04-24"><i>2 min ago</i></td> +<td title="2009-04-24"><i>6 min ago</i></td> <td><i>Paul Gortmaker</i></td> -<td><a class="list subject" title="some string that is longer than roughly 50chars, with a ^X embedded at the end<span class="cntrl">\18</span>" href="/gitweb/gitweb.cgi?p=local/pgortmak/test/bad_commit/.git;a=commit;h=8735814a15cf930c48fd33563f5922a103b6b4ea">some string that is longer than roughly 50chars, with... <span class="refs"> <span class="head" title="heads/master"><a href="/gitweb/gitweb.cgi?p=local/pgortmak/test/bad_commit/.git;a=shortlog;h=refs/heads/master">master</a></span></span></a></td> +<td><a class="list subject" title="some string that is longer than roughly 50chars, with a ^X embedded at the end" href="/gitweb/gitweb.cgi?p=local/pgortmak/test/bad_commit/.git;a=commit;h=8735814a15cf930c48fd33563f5922a103b6b4ea">some string that is longer than roughly 50chars, with... <span class="refs"> <span class="head" title="heads/master"><a href="/gitweb/gitweb.cgi?p=local/pgortmak/test/bad_commit/.git;a=shortlog;h=refs/heads/master">master</a></span></span></a></td> <td class="link"><a href="/gitweb/gitweb.cgi?p=local/pgortmak/test/bad_commit/.git;a=commit;h=8735814a15cf930c48fd33563f5922a103b6b4ea">commit</a> | <a href="/gitweb/gitweb.cgi?p=local/pgortmak/test/bad_commit/.git;a=commitdiff;h=8735814a15cf930c48fd33563f5922a103b6b4ea">commitdiff</a> | <a href="/gitweb/gitweb.cgi?p=local/pgortmak/test/bad_commit/.git;a=tree;h=8735814a15cf930c48fd33563f5922a103b6b4ea;hb=8735814a15cf930c48fd33563f5922a103b6b4ea">tree</a> | <a title="in format: tar.gz" href="/gitweb/gitweb.cgi?p=local/pgortmak/test/bad_commit/.git;a=snapshot;h=8735814a15cf930c48fd33563f5922a103b6b4ea;sf=tgz">snapshot</a></td> </tr> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] gitweb: filter escapes from longer commit titles that break firefox 2009-04-24 19:48 ` Paul Gortmaker @ 2009-04-24 22:10 ` Jakub Narebski 2009-04-25 9:04 ` Jakub Narebski 0 siblings, 1 reply; 7+ messages in thread From: Jakub Narebski @ 2009-04-24 22:10 UTC (permalink / raw) To: Paul Gortmaker; +Cc: git On Fri, 24 April 2009, Paul Gortmaker wrote: > [Re: [PATCH] gitweb: filter escapes from longer commit titles that break firefox] > On 24/04/2009 (Fri 19:53) Jakub Narebski wrote: >> On Mon, 20 April 2009, Paul Gortmaker wrote: >>> Jakub Narebski wrote: >>>> Paul Gortmaker <paul.gortmaker@windriver.com> writes: >>>> >>>> >>>>> If there is a commit that ends in ^X and is longer in length than >>>>> what will fit in title_short, then it doesn't get fed through >>>>> esc_html() and so the ^X will appear as-is in the page source. >>>>> >>>>> When Firefox comes across this, it will fail to display the page, >>>>> and only display a couple lines of error messages that read like: >>>>> >>>>> XML Parsing Error: not well-formed >>>>> Location: http://git .... >>>>> >>>>> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> >>>> And I think that issue might be a bug elsewhere in gitweb if we have >>>> text output which is not passed through esc_html... or bug in CGI.pm >>>> if the error is in not escaping of -title _attribute_ (attribute >>>> escaping has slightly different rules than escaping HTML, and should >>>> be done automatically by CGI.pm). >> What more important is: can you find out in more detail _where_ >> an error (unescaped control character) occurs: is it tag contents or >> 'title' attribute for some tag, what tag is it (name and class), in >> what view or views this bug is present, and in which part this occur? >> Without those details it would b much harder to diagnose this bug... > > No problem -- It appears to be in the title attribute, and it appears > straight away when I go to the toplevel view of the repo, assuming that > the commit is within the top 10 recent commits that are shown on the > summary page. I've put more details below on how I can reproduce it > and the page source deltas -- hopefully this will help. If there is > something else I can provide that would help, don't hesitate to ask. Ahh... that is what I thought. The problem that we have to solve to fix this bug is twofold: * CGI.pm does by default slight escaping (simple_escape from CGI::Util) of _attribute_ values, but for obvious reasons it cannot do unconditional escaping of tag _contents_ (because it can be HTML itself). This escaping, at least in CGI.pm version 3.10 (most current version at CPAN is 3.43), is minimal: only '"', '&', '<' and '>' are escaped using named HTML entity references (", &, < and > respectively). simple_escape does not do escaping of control characters such as ^X which are invalid in XHTML (in strict mode). Note that IIRC escaping '<' and '>' in attributes is not strictly necessary. Gitweb relies on the fact that CGI.pm does escaping of attribute values. We cannot escape attributes (e.g. "title" attribute with (almost) full commit subject) as it is now, because it would lead to double escaping. Fortunately it is possible to turn off autoescaping by using $cgi->autoEscape(undef); note however that we would have to do attribute escaping by ourself in the scope of this declaration. * Rules for escaping attribute values are slightly different for rules for escaping HTML. For attribute values we have to escape '"' because it is attribute delimiter, and '&' because it is escape character; escaping '<' and '>' is not strictly necessary. For escaping HTML we need to escape '<' and '>' because they introduce tags, and '&' because it is escape character; escaping '"' is not strictly necessary. It does not make sense to replace spaces by in attribute values, although it shouldn't harm. OTOH we should perhaps escape newlines in attribute values. For esc_html and esc_path we replace (currently) control characters by character escape codes (e.g. "\f" for form-feed, "\0" for NUL, hexadecimal escapes for 'other' control characters). But it is not the only possible solution. We can use Unicode printable representation of control characters instead (0x2400 sheet). Or we can use control key sequence / caret notation e.g. ^X for \0x18, or ^L for "\f" there. We probably should discus this in more detail. So it is not that simple... P.S. The subject (one line summary of this change) should be also changed to for example "gitweb: escape control characters in attributes" and in commit message itself you should explain that control characters break rendering in Firefox in strict XML compliance mode... or something like that. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] gitweb: filter escapes from longer commit titles that break firefox 2009-04-24 22:10 ` Jakub Narebski @ 2009-04-25 9:04 ` Jakub Narebski 0 siblings, 0 replies; 7+ messages in thread From: Jakub Narebski @ 2009-04-25 9:04 UTC (permalink / raw) To: Paul Gortmaker; +Cc: git On Sat, 25 April 2009, Jakub Narebski wrote: > So it is not that simple... That said, here is simple patch which should fix the bug you found. It always creates sensible short and long values, contrary to your patch (take a look at gitweb output after your patch, including tooltips on mouseover). But it is NOT TESTED if it works correctly, and if it covers all occurrences. And it might be not necessary in all its complication: we could simply replace control characters by '?' like in chop_and_escape_str subroutine (which would also make gitweb more consistent). It also lacks commit message. Nevertheless it might be good bandaid for your problem: -- >8 -- diff --git c/gitweb/gitweb.perl w/gitweb/gitweb.perl index 3f99361..8575d5f 100755 --- c/gitweb/gitweb.perl +++ w/gitweb/gitweb.perl @@ -1035,6 +1035,24 @@ sub esc_url { return $str; } +# quote and escape tag attribute values; autoEscape has to be turned off +sub esc_attr { + my $str = shift; + return $str unless defined $str; + + my %ent = ( # named HTML entities + '"' => '"', + '&' => '&', + '<' => '<', + '>' => '>', + ); + $str = to_utf8($str); + $str =~ s|([\"&<>])|$ent{$1}|eg; + $str =~ s|([[:cntrl:]])|(($1 ne "\t") ? quot_upr($1) : $1)|eg; + + return $str; +} + # replace invalid utf8 character with SUBSTITUTION sequence sub esc_html ($;%) { my $str = shift; @@ -1457,14 +1475,19 @@ sub format_subject_html { my ($long, $short, $href, $extra) = @_; $extra = '' unless defined($extra); + my $ret = ''; if (length($short) < length($long)) { - return $cgi->a({-href => $href, -class => "list subject", - -title => to_utf8($long)}, + my $autoescape = $cgi->autoEscape(undef); + # or just replace s/([[:cntrl:]])/?/g in -title + $ret = $cgi->a({-href => $href, -class => "list subject", + -title => esc_attr($long)}, esc_html($short) . $extra); + $cgi->autoEscape($autoescape); # restore original value } else { - return $cgi->a({-href => $href, -class => "list subject"}, + $ret = $cgi->a({-href => $href, -class => "list subject"}, esc_html($long) . $extra); } + return $ret; } # format git diff header line, i.e. "diff --(git|combined|cc) ..." ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-04-25 9:06 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-17 16:24 [PATCH] gitweb: filter escapes from longer commit titles that break firefox Paul Gortmaker 2009-04-20 9:32 ` Jakub Narebski 2009-04-20 13:29 ` Paul Gortmaker 2009-04-24 17:53 ` Jakub Narebski 2009-04-24 19:48 ` Paul Gortmaker 2009-04-24 22:10 ` Jakub Narebski 2009-04-25 9:04 ` Jakub Narebski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).