* [PATCH] Do not chop HTML tags in commit search result @ 2008-02-13 17:37 Jean-Baptiste Quenot 2008-02-13 19:16 ` Jakub Narebski 2008-02-13 19:43 ` Junio C Hamano 0 siblings, 2 replies; 16+ messages in thread From: Jean-Baptiste Quenot @ 2008-02-13 17:37 UTC (permalink / raw) To: git Hi there, Thanks for Git! It's a great program. I encountered an annoying bug with gitweb 1.5.4.1, when searching for commits, if the search string is too long, the generated HTML is munged leading to an ill-formed XHTML document. Here is the patch, hope it helps: diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index ae2d057..2c0b990 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3780,7 +3780,10 @@ sub git_search_grep_body { my $trail = esc_html($3) || ""; $trail = chop_str($trail, 30, 10); my $text = "$lead<span class=\"match\">$match</span>$trail"; - print chop_str($text, 80, 5) . "<br/>\n"; + # Do not chop $text as match can be long, and we don't want to + # munge HTML tags! + #print chop_str($text, 80, 5) . "<br/>\n"; + print $text . "<br/>\n"; } } print "</td>\n" . -- 1.5.4.1-dirty ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] Do not chop HTML tags in commit search result 2008-02-13 17:37 [PATCH] Do not chop HTML tags in commit search result Jean-Baptiste Quenot @ 2008-02-13 19:16 ` Jakub Narebski 2008-02-13 19:43 ` Junio C Hamano 1 sibling, 0 replies; 16+ messages in thread From: Jakub Narebski @ 2008-02-13 19:16 UTC (permalink / raw) To: Jean-Baptiste Quenot; +Cc: git "Jean-Baptiste Quenot" <jbq@caraldi.com> writes: > Thanks for Git! It's a great program. I encountered an annoying bug > with gitweb 1.5.4.1, when searching for commits, if the search string > is too long, the generated HTML is munged leading to an ill-formed > XHTML document. > > Here is the patch, hope it helps: It would be better if the patch wasn't wrapped and whitespace corrupted. > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index ae2d057..2c0b990 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -3780,7 +3780,10 @@ sub git_search_grep_body { > my $trail = esc_html($3) || ""; > $trail = chop_str($trail, 30, 10); > my $text = "$lead<span class=\"match\">$match</span>$trail"; > - print chop_str($text, 80, 5) . "<br/>\n"; > + # Do not chop $text as match can be long, and we don't want to > + # munge HTML tags! > + #print chop_str($text, 80, 5) . "<br/>\n"; > + print $text . "<br/>\n"; > } > } > print "</td>\n" . While this might be good bigfix, I think it is not a good solution. If we select to show only neighbourhood of the match, we should probably chop trailing text, or both leading (cutting at beginning) and trailing, perhaps even match if it is overly long. Or we could alternatively show all lines of commit message, and only mark matching fragment... but that I think would have to wait for refactoring of generation of log-like views (log, shorlog, history, rss) in gitweb. -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Do not chop HTML tags in commit search result 2008-02-13 17:37 [PATCH] Do not chop HTML tags in commit search result Jean-Baptiste Quenot 2008-02-13 19:16 ` Jakub Narebski @ 2008-02-13 19:43 ` Junio C Hamano 2008-02-22 16:33 ` [PATCH] gitweb: Better chopping in commit search results Jakub Narebski 1 sibling, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2008-02-13 19:43 UTC (permalink / raw) To: Jean-Baptiste Quenot; +Cc: git "Jean-Baptiste Quenot" <jbq@caraldi.com> writes: > ... I encountered an annoying bug > with gitweb 1.5.4.1, when searching for commits, if the search string > is too long, the generated HTML is munged leading to an ill-formed > XHTML document. > Here is the patch, hope it helps: > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index ae2d057..2c0b990 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -3780,7 +3780,10 @@ sub git_search_grep_body { > my $trail = esc_html($3) || ""; > $trail = chop_str($trail, 30, 10); > ... > my $text = "$lead<span > class=\"match\">$match</span>$trail"; > - print chop_str($text, 80, 5) . "<br/>\n"; I think esc_html() and chop_str() are backwards here. If $3 is overlong it is cut in the middle of some markup. Even though chop_str() claims to be "HTML aware", I do not think it is. It seems to know about "&entities;" but not mark-ups. There are quite a many instances of esc_html() first then chop_str() in that function, and I think they all deserve to be fixed. my $comment = $co{'comment'}; foreach my $line (@$comment) { if ($line =~ m/^(.*)($search_regexp)(.*)$/i) { my $lead = esc_html($1) || ""; $lead = chop_str($lead, 30, 10); my $match = esc_html($2) || ""; my $trail = esc_html($3) || ""; $trail = chop_str($trail, 30, 10); my $text = "$lead<span class=\"match\">$match</span>$trail"; print chop_str($text, 80, 5) . "<br/>\n"; } } I think this is trying to fit the result on a line, showing the match sandwitched by not-too-long part taken from leading and trailing context ($lead and $trail can be chomped aggressively than $match). But $lead and $trail are escaped then chomped which is already wrong. I think the body of that if() would be better written like this: my ($lead, $match, $trail) = ($1, $2, $3); $match = chop_str($match, 70, $slop); # in case it is very long... $contextlen = (80 - len($match)) / 2; # and the remainder... if ($contextlen > 30) { $contextlen = 30 }; # but not too much $trail = chop_str($trail, $contextlen, $slop); $lead = chop_str($lead, $contextlen, $slop); $lead = esc_html($lead); $match = esc_html($match); $trail = esc_html($trail); print "$lead<span ...>$match</span>$trail"; ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] gitweb: Better chopping in commit search results 2008-02-13 19:43 ` Junio C Hamano @ 2008-02-22 16:33 ` Jakub Narebski 2008-02-22 17:14 ` Junio C Hamano 2008-02-23 9:27 ` [PATCH] gitweb: Better chopping in commit search results Karl Hasselström 0 siblings, 2 replies; 16+ messages in thread From: Jakub Narebski @ 2008-02-22 16:33 UTC (permalink / raw) To: git; +Cc: Jean-Baptiste Quenot, Junio C Hamano, Jakub Narebski From: Junio C Hamano <gitster@pobox.com> Subject: [PATCH] gitweb: Better chopping in commit search results When searching commit messages (commit search), if matched string is too long, the generated HTML was munged leading to an ill-formed XHTML document. Now gitweb chop leading, trailing and matched parts, HTML escapes those parts, then composes and marks up match info. HTML output is never chopped. Limiting matched info to 80 columns (with slop) is now done by dividing remaining characters after chopping match equally to leading and trailing part, not by chopping composed and HTML marked output. Noticed-by: Jean-Baptiste Quenot <jbq@caraldi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- This is just slightly reworked Junio's patch; probably should be marked as from Junio, so I'm trying to send it as it. Strange that StGit always sends patches (stg mail) as if repo owner was their author, regardless of path/commit author (I think; unless "stg edit" cannot change authorship). gitweb/gitweb.perl | 24 +++++++++++++++--------- 1 files changed, 15 insertions(+), 9 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 8ed6d04..326e27c 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3784,18 +3784,24 @@ sub git_search_grep_body { print "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" . "<td><i>" . $author . "</i></td>\n" . "<td>" . - $cgi->a({-href => href(action=>"commit", hash=>$co{'id'}), -class => "list subject"}, - chop_and_escape_str($co{'title'}, 50) . "<br/>"); + $cgi->a({-href => href(action=>"commit", hash=>$co{'id'}), + -class => "list subject"}, + chop_and_escape_str($co{'title'}, 50) . "<br/>"); my $comment = $co{'comment'}; foreach my $line (@$comment) { if ($line =~ m/^(.*)($search_regexp)(.*)$/i) { - my $lead = esc_html($1) || ""; - $lead = chop_str($lead, 30, 10); - my $match = esc_html($2) || ""; - my $trail = esc_html($3) || ""; - $trail = chop_str($trail, 30, 10); - my $text = "$lead<span class=\"match\">$match</span>$trail"; - print chop_str($text, 80, 5) . "<br/>\n"; + my ($lead, $match, $trail) = ($1, $2, $3); + $match = chop_str($match, 70, 5); # in case match is very long + my $contextlen = (80 - len($match))/2; # is left for the remainder + $contextlen = 30 if ($contextlen > 30); # but not too much + $lead = chop_str($lead, $contextlen, 10); + $trail = chop_str($trail, $contextlen, 10); + + $lead = esc_html($lead); + $match = esc_html($match); + $trail = esc_html($trail); + + print "$lead<span class=\"match\">$match</span>$trail<br />"; } } print "</td>\n" . ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] gitweb: Better chopping in commit search results 2008-02-22 16:33 ` [PATCH] gitweb: Better chopping in commit search results Jakub Narebski @ 2008-02-22 17:14 ` Junio C Hamano 2008-02-22 17:49 ` Jakub Narebski 2008-02-23 9:27 ` [PATCH] gitweb: Better chopping in commit search results Karl Hasselström 1 sibling, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2008-02-22 17:14 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Jean-Baptiste Quenot Jakub Narebski <jnareb@gmail.com> writes: > From: Junio C Hamano <gitster@pobox.com> > Subject: [PATCH] gitweb: Better chopping in commit search results > > When searching commit messages (commit search), if matched string is > too long, the generated HTML was munged leading to an ill-formed XHTML > document. > > Now gitweb chop leading, trailing and matched parts, HTML escapes > those parts, then composes and marks up match info. HTML output is > never chopped. Limiting matched info to 80 columns (with slop) is now > done by dividing remaining characters after chopping match equally to > leading and trailing part, not by chopping composed and HTML marked > output. Could somebody test this with very long search string, as that was how the issue initially came up, to see (1) if it really fixes the "mark-up chopped in the middle" issue, (2) and how the actual output looks like? Regarding the latter, I have a slight suspicion that chopping the tail of the middle part and showing very little context may not produce a very useful output. For example, if you are looking for "very long ... and how" in the first paragraph of message (if it were all on a single line), wouldn't you want to see: ...st this with <<very long ... and how>> the actual out... rather than: Could som... <<very long search stri...>> the actual out... in the result? That is, it any chopping ever needs to happen, I suspect a more useful way to shorten the output would be to: - divide the available space to give enough space to give context for head and tail part. - chop head from the left, if needed, with leading ellipsis; - chop tail from the right, if needed, with trailing ellipsis; - chop search string from both ends, if needed, with leading and trailing ellipses. Hmm? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] gitweb: Better chopping in commit search results 2008-02-22 17:14 ` Junio C Hamano @ 2008-02-22 17:49 ` Jakub Narebski 2008-02-22 19:14 ` Jakub Narebski 0 siblings, 1 reply; 16+ messages in thread From: Jakub Narebski @ 2008-02-22 17:49 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jean-Baptiste Quenot On Fri, 22 Feb 2008, Junio C Hamano wrote: > Jakub Narebski <jnareb@gmail.com> writes: > >> From: Junio C Hamano <gitster@pobox.com> >> Subject: [PATCH] gitweb: Better chopping in commit search results >> >> When searching commit messages (commit search), if matched string is >> too long, the generated HTML was munged leading to an ill-formed XHTML >> document. >> >> Now gitweb chop leading, trailing and matched parts, HTML escapes >> those parts, then composes and marks up match info. HTML output is >> never chopped. Limiting matched info to 80 columns (with slop) is now >> done by dividing remaining characters after chopping match equally to >> leading and trailing part, not by chopping composed and HTML marked >> output. > > Could somebody test this with very long search string, as that > was how the issue initially came up, to see (1) if it really > fixes the "mark-up chopped in the middle" issue, (2) [...] The bug in question was cause by the chop _after_ doing HTML markup. Now gitweb chops, then HTML escapes, and chops no more. There is no way this bug can happen now. BTW if commit messages follows "wrap at 76 column" convention it is not easy to test this condition... :-) But you are right that output should be improved... > For example, if you are looking for "very long ... and how" > in the first paragraph of message (if it were all on a single > line), wouldn't you want to see: > > ...st this with <<very long ... and how>> the actual out... > > rather than: > > Could som... <<very long search stri...>> the actual out... > > in the result? ...but I think it is better left for another patch. P.S. When testing this commit I have noticed that currently, probably due to some misquoting, or interaction between escapemeta and quoting, searching for messages which contain "'" (apostrophe), e.g. "don't" currently doesn't work. Will investigate... -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] gitweb: Better chopping in commit search results 2008-02-22 17:49 ` Jakub Narebski @ 2008-02-22 19:14 ` Jakub Narebski 2008-02-23 21:44 ` [RFC/PATCH] gitweb: Option to chop at beginning and in the middle in chop_str Jakub Narebski ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Jakub Narebski @ 2008-02-22 19:14 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jean-Baptiste Quenot Jakub Narebski wrote: > On Fri, 22 Feb 2008, Junio C Hamano wrote: >> For example, if you are looking for "very long ... and how" >> in the first paragraph of message (if it were all on a single >> line), wouldn't you want to see: >> >> ...st this with <<very long ... and how>> the actual out... >> >> rather than: >> >> Could som... <<very long search stri...>> the actual out... >> >> in the result? > > ...but I think it is better left for another patch. End here is proposed improved chop_str which can do chopping at beginning, in the middle, and (as it used to do) at the end. Some questions about the code: * should we divide slop in two also when chopping in the middle? * what should extra option be named, and what should be names of posible values of this option (the option deciding where to chop) * $add_len has default value if not provided, or if 0 (!), or if ''; you have to use chop_str($str, 20, undef, -pos=>'center') trick to use it with extra options. * can the code be improved? I'm not Perl expert. -- >8 -- sub chop_str { my $str = shift; my $len = shift; my $add_len = shift || 10; # supported opts: # * -pos => 'left' | 'center' | 'right', defaults to 'right' # denotes where (which part) to chop my %opts = @_; # allow only $len chars, but don't cut a word if it would fit in $add_len # if it doesn't fit, cut it if it's still longer than the dots we would add # remove chopped character entities entirely # when chopping in the middle, distribute $len into left and right part if (defined $opts{'-pos'} && $opts{'-pos'} eq 'center') { $len = int($len/2); } # regexps: ending and beginning with word part up to $add_len my $endre = qr/.{0,$len}[^ \/\-_:\.@]{0,$add_len}/; my $begre = qr/[^ \/\-_:\.@]{0,$add_len}.{0,$len}/; if (defined $opts{'-pos'} && $opts{'-pos'} eq 'left') { $str =~ m/^(.*?)($begre)$/; my ($lead, $body) = ($1, $2); if (length($lead) > 4) { if ($lead =~ m/&[^;]*$/) { $body =~ s/^[^;]*;//; } $lead = "... "; } return "$lead$body"; } elsif (defined $opts{'-pos'} && $opts{'-pos'} eq 'center') { $str =~ m/^($endre)(.*)$/; my ($left, $str) = ($1, $2); $str =~ m/^(.*?)($begre)$/; my ($mid, $right) = ($1, $2); if (length($mid) > 5) { $left =~ s/&[^;]*$//; if ($mid =~ m/&[^;]*$/) { $right =~ s/^[^;]*;//; } $mid = " ... "; } return "$left$mid$right"; } else { $str =~ m/^($endre)(.*)$/; my $body = $1; my $tail = $2; if (length($tail) > 4) { $body =~ s/&[^;]*$//; $tail = " ..."; } return "$body$tail"; } } -- >8 -- Example usage: chop_str($str, 15, 5, -pos=>'center') -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC/PATCH] gitweb: Option to chop at beginning and in the middle in chop_str 2008-02-22 19:14 ` Jakub Narebski @ 2008-02-23 21:44 ` Jakub Narebski 2008-02-23 22:04 ` [PATCH] gitweb: Better chopping in commit search results Junio C Hamano 2008-02-24 13:01 ` [RFC/PATCH v2] gitweb: Option to chop at beginning and in the middle in chop_str Jakub Narebski 2 siblings, 0 replies; 16+ messages in thread From: Jakub Narebski @ 2008-02-23 21:44 UTC (permalink / raw) To: git; +Cc: Jean-Baptiste Quenot, Junio C Hamano, Petr Baudis, Jakub Narebski Add support for '-cut' option to chop_str subroutine, to cut at the beginning (from the left side of the string), in the middle (center of the string), or at the end (from the right side of the string) which is the default: chop_str(somestring, len, slop, -pos=>'left') -> ... string chop_str(somestring, len, slop, -pos=>'center') -> som ... ing chop_str(somestring, len, slop, -pos=>'right') -> somestr ... Simplify passing all arguments to chop_str in chop_and_escape_str subroutine. This was needed to pass additional options to chop_str. Make use of new feature of chop_str to better cut matched string and its context in match info for searching commit messages (commit search), as proposed by Junio C Hamano. For example, if you are looking for "very long ... and how" in the first paragraph of message (if it were all on a single line), you would now see: ...st this with <<very long ... and how>> the actual out... instead of: Could som... <<very long search stri...>> the actual out... (where <<something>> denotes emphasized / colored fragment). Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- And here it is as a patch to gitweb to play with. I agree with Junio that the output is better, but I'm not sure if it is worth the complication in code; perhaps is. gitweb/gitweb.perl | 74 +++++++++++++++++++++++++++++++++++++++++----------- 1 files changed, 58 insertions(+), 16 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index e8226b1..59d44b3 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -848,32 +848,74 @@ sub project_in_list { ## ---------------------------------------------------------------------- ## HTML aware string manipulation +# cut (chop) string to given length, with additional slop, +# optionally from left and in the middle. sub chop_str { my $str = shift; my $len = shift; my $add_len = shift || 10; + # supported opts: + # * -cut => 'left' | 'center' | 'right', defaults to 'right' + # denotes where (which part) to chop + my %opts = @_; # allow only $len chars, but don't cut a word if it would fit in $add_len # if it doesn't fit, cut it if it's still longer than the dots we would add - $str =~ m/^(.{0,$len}[^ \/\-_:\.@]{0,$add_len})(.*)/; - my $body = $1; - my $tail = $2; - if (length($tail) > 4) { - $tail = " ..."; - $body =~ s/&[^;]*$//; # remove chopped character entities + # remove chopped character entities entirely + + # when chopping in the middle, distribute $len into left and right part + if (defined $opts{'-cut'} && $opts{'-cut'} eq 'center') { + $len = int($len/2); + } + + # regexps: ending and beginning with word part up to $add_len + my $endre = qr/.{0,$len}[^ \/\-_:\.@]{0,$add_len}/; + my $begre = qr/[^ \/\-_:\.@]{0,$add_len}.{0,$len}/; + + if (defined $opts{'-cut'} && $opts{'-cut'} eq 'left') { + $str =~ m/^(.*?)($begre)$/; + my ($lead, $body) = ($1, $2); + if (length($lead) > 4) { + if ($lead =~ m/&[^;]*$/) { + $body =~ s/^[^;]*;//; + } + $lead = "... "; + } + return "$lead$body"; + + } elsif (defined $opts{'-cut'} && $opts{'-cut'} eq 'center') { + $str =~ m/^($endre)(.*)$/; + my ($left, $str) = ($1, $2); + $str =~ m/^(.*?)($begre)$/; + my ($mid, $right) = ($1, $2); + if (length($mid) > 5) { + $left =~ s/&[^;]*$//; + if ($mid =~ m/&[^;]*$/) { + $right =~ s/^[^;]*;//; + } + $mid = " ... "; + } + return "$left$mid$right"; + + } else { + $str =~ m/^($endre)(.*)$/; + my $body = $1; + my $tail = $2; + if (length($tail) > 4) { + $body =~ s/&[^;]*$//; + $tail = " ..."; + } + return "$body$tail"; } - return "$body$tail"; } # takes the same arguments as chop_str, but also wraps a <span> around the # result with a title attribute if it does get chopped. Additionally, the # string is HTML-escaped. sub chop_and_escape_str { - my $str = shift; - my $len = shift; - my $add_len = shift || 10; + my ($str) = @_; - my $chopped = chop_str($str, $len, $add_len); + my $chopped = chop_str(@_); if ($chopped eq $str) { return esc_html($chopped); } else { @@ -3791,11 +3833,11 @@ sub git_search_grep_body { foreach my $line (@$comment) { if ($line =~ m/^(.*)($search_regexp)(.*)$/i) { my ($lead, $match, $trail) = ($1, $2, $3); - $match = chop_str($match, 70, 5); # in case match is very long - my $contextlen = int((80 - length($match))/2); # for the remainder - $contextlen = 30 if ($contextlen > 30); # but not too much - $lead = chop_str($lead, $contextlen, 10); - $trail = chop_str($trail, $contextlen, 10); + $match = chop_str($match, 70, 5, -cut=>'center'); + my $contextlen = int((80 - length($match))/2); + $contextlen = 30 if ($contextlen > 30); + $lead = chop_str($lead, $contextlen, 10, -cut=>'left'); + $trail = chop_str($trail, $contextlen, 10, -cut=>'right'); $lead = esc_html($lead); $match = esc_html($match); ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] gitweb: Better chopping in commit search results 2008-02-22 19:14 ` Jakub Narebski 2008-02-23 21:44 ` [RFC/PATCH] gitweb: Option to chop at beginning and in the middle in chop_str Jakub Narebski @ 2008-02-23 22:04 ` Junio C Hamano 2008-02-23 23:36 ` Jakub Narebski 2008-02-24 13:01 ` [RFC/PATCH v2] gitweb: Option to chop at beginning and in the middle in chop_str Jakub Narebski 2 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2008-02-23 22:04 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Jean-Baptiste Quenot Jakub Narebski <jnareb@gmail.com> writes: > # regexps: ending and beginning with word part up to $add_len > my $endre = qr/.{0,$len}[^ \/\-_:\.@]{0,$add_len}/; > my $begre = qr/[^ \/\-_:\.@]{0,$add_len}.{0,$len}/; I have no idea what these line noise characters inside [] are. Did you mean something like "\w"? I have a suspicion that it may be easier to read and could be even more efficient to split an overlong line at word boundaries and to remove elements from the end you are removing from until it fits. sub chop_whence { my ($line, $max, $slop, $where) = @_; my $len = length($line); if ($len < $max + $slop) { return $line; } # Cut at word boundaries my @split = split(/\b/, $line); my $filler = "..."; while ((2 < @split)) { my $removed; my $splice_at; if ($where eq 'left') { $removed = shift @split; } elsif ($where eq 'right') { $removed = pop @split; } else { my $splice_at = int($#split / 2); $removed = splice(@split, $splice_at, 1); } $len -= length($removed) + length($filler); if ($len < $max + $slop) { if ($where eq 'left') { unshift @split, $filler; } elsif ($where eq 'right') { push @split, $filler; } else { my $splice_at = int($#split / 2); splice(@split, $splice_at, 0, $filler); } return join('', @split); } } # give up return $line; } ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] gitweb: Better chopping in commit search results 2008-02-23 22:04 ` [PATCH] gitweb: Better chopping in commit search results Junio C Hamano @ 2008-02-23 23:36 ` Jakub Narebski 0 siblings, 0 replies; 16+ messages in thread From: Jakub Narebski @ 2008-02-23 23:36 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Sat, 23 Feb 2008, Junio C Hamano wrote: > Jakub Narebski <jnareb@gmail.com> writes: > > > # regexps: ending and beginning with word part up to $add_len > > my $endre = qr/.{0,$len}[^ \/\-_:\.@]{0,$add_len}/; > > my $begre = qr/[^ \/\-_:\.@]{0,$add_len}.{0,$len}/; > > I have no idea what these line noise characters inside [] are. > Did you mean something like "\w"? They were in original chop_str, written by Kay Sievers if I have checked correctly, I have only repeated it. But changing it to "\w" might be a good idea. > I have a suspicion that it may be easier to read and could be > even more efficient to split an overlong line at word boundaries > and to remove elements from the end you are removing from until > it fits. > > sub chop_whence { > my ($line, $max, $slop, $where) = @_; IMHO it is neither easier to read, nor more efficient. And changes semantic a bit. Original chop_str used to mean: chop_str($str, $len, $add_len) means: Try to chop on a word boundary between position $len and $len+$add_len. If there is no word boundary between $len and $len+$add_len, chop at $add_len, and replace chopped part by " ...". Do not chop if to be replaced part is shorter than ellipsis, i.e. " ..." replacement. (I think I add this description to gitweb). With this description the code is I think quite obvious. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC/PATCH v2] gitweb: Option to chop at beginning and in the middle in chop_str 2008-02-22 19:14 ` Jakub Narebski 2008-02-23 21:44 ` [RFC/PATCH] gitweb: Option to chop at beginning and in the middle in chop_str Jakub Narebski 2008-02-23 22:04 ` [PATCH] gitweb: Better chopping in commit search results Junio C Hamano @ 2008-02-24 13:01 ` Jakub Narebski 2008-02-25 1:46 ` Junio C Hamano 2 siblings, 1 reply; 16+ messages in thread From: Jakub Narebski @ 2008-02-24 13:01 UTC (permalink / raw) To: git; +Cc: Jean-Baptiste Quenot, Junio C Hamano, Petr Baudis, Jakub Narebski Add support for '-cut' option to chop_str subroutine, to cut at the beginning (from the left side of the string), in the middle (center of the string), or at the end (from the right side of the string) which is the default: chop_str(somestring, len, slop, -cut=>'left') -> ' ...string' chop_str(somestring, len, slop, -cut=>'center') -> 'som ... ing' chop_str(somestring, len, slop, -cut=>'right') -> 'somestr... ' While at it return from chop_str early if given string is so short that chop_str couldn't shorten it, and simplify regexp used. Make ellipsis (ots) stick to shorthened fragment for cutting at ends. Simplify passing all arguments to chop_str in chop_and_escape_str subroutine. This was needed to pass additional options to chop_str. Make use of new feature of chop_str to better cut matched string and its context in match info for searching commit messages (commit search), as proposed by Junio C Hamano. For example, if you are looking for "very long ... and how" in the first paragraph of message (if it were all on a single line), you would now see: ...st this with <<very long ... and how>> the actual out... instead of: Could som... <<very long search stri...>> the actual out... (where <<something>> denotes emphasized / colored fragment). Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- This is second version of the patch, with regexp simplified, and early return. Rudimentarly tested. gitweb/gitweb.perl | 77 +++++++++++++++++++++++++++++++++++++++++----------- 1 files changed, 61 insertions(+), 16 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index e8226b1..9a8e0a6 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -848,32 +848,77 @@ sub project_in_list { ## ---------------------------------------------------------------------- ## HTML aware string manipulation +# Try to chop given string on a word boundary between position +# $len and $len+$add_len. If there is no word boundary there, +# chop at $len+$add_len. Do not chop if chopped part plus ellipsis +# (marking chopped part) would be longer than given string. sub chop_str { my $str = shift; my $len = shift; my $add_len = shift || 10; + my $where = shift || 'right' # 'left' | 'center' | 'right' # allow only $len chars, but don't cut a word if it would fit in $add_len # if it doesn't fit, cut it if it's still longer than the dots we would add - $str =~ m/^(.{0,$len}[^ \/\-_:\.@]{0,$add_len})(.*)/; - my $body = $1; - my $tail = $2; - if (length($tail) > 4) { - $tail = " ..."; - $body =~ s/&[^;]*$//; # remove chopped character entities + # remove chopped character entities entirely + + # when chopping in the middle, distribute $len into left and right part + # return early if chupping wouldn't make string shorter + if ($where eq 'center') { + return $str if ($len + 5 >= length($str)); # filler is length 5 + $len = int($len/2); + } else { + return $str if ($len + 4 >= length($str)); # filler is length 4 + } + + # regexps: ending and beginning with word part up to $add_len + my $endre = qr/.{$len}\w{0,$add_len}/; + my $begre = qr/\w{0,$add_len}.{$len}/; + + if ($where eq 'left') { + $str =~ m/^(.*?)($begre)$/; + my ($lead, $body) = ($1, $2); + if (length($lead) > 4) { + if ($lead =~ m/&[^;]*$/) { + $body =~ s/^[^;]*;//; + } + $lead = " ..."; + } + return "$lead$body"; + + } elsif ($where eq 'center') { + $str =~ m/^($endre)(.*)$/; + my ($left, $str) = ($1, $2); + $str =~ m/^(.*?)($begre)$/; + my ($mid, $right) = ($1, $2); + if (length($mid) > 5) { + $left =~ s/&[^;]*$//; + if ($mid =~ m/&[^;]*$/) { + $right =~ s/^[^;]*;//; + } + $mid = " ... "; + } + return "$left$mid$right"; + + } else { + $str =~ m/^($endre)(.*)$/; + my $body = $1; + my $tail = $2; + if (length($tail) > 4) { + $body =~ s/&[^;]*$//; + $tail = "... "; + } + return "$body$tail"; } - return "$body$tail"; } # takes the same arguments as chop_str, but also wraps a <span> around the # result with a title attribute if it does get chopped. Additionally, the # string is HTML-escaped. sub chop_and_escape_str { - my $str = shift; - my $len = shift; - my $add_len = shift || 10; + my ($str) = @_; - my $chopped = chop_str($str, $len, $add_len); + my $chopped = chop_str(@_); if ($chopped eq $str) { return esc_html($chopped); } else { @@ -3791,11 +3836,11 @@ sub git_search_grep_body { foreach my $line (@$comment) { if ($line =~ m/^(.*)($search_regexp)(.*)$/i) { my ($lead, $match, $trail) = ($1, $2, $3); - $match = chop_str($match, 70, 5); # in case match is very long - my $contextlen = int((80 - length($match))/2); # for the remainder - $contextlen = 30 if ($contextlen > 30); # but not too much - $lead = chop_str($lead, $contextlen, 10); - $trail = chop_str($trail, $contextlen, 10); + $match = chop_str($match, 70, 5, 'center'); + my $contextlen = int((80 - length($match))/2); + $contextlen = 30 if ($contextlen > 30); + $lead = chop_str($lead, $contextlen, 10, 'left'); + $trail = chop_str($trail, $contextlen, 10, 'right'); $lead = esc_html($lead); $match = esc_html($match); -- Stacked GIT 0.14.1 git version 1.5.4.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC/PATCH v2] gitweb: Option to chop at beginning and in the middle in chop_str 2008-02-24 13:01 ` [RFC/PATCH v2] gitweb: Option to chop at beginning and in the middle in chop_str Jakub Narebski @ 2008-02-25 1:46 ` Junio C Hamano 2008-02-25 20:07 ` [RFC/PATCH v3] gitweb: Better cutting matched string and its context Jakub Narebski 0 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2008-02-25 1:46 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Jean-Baptiste Quenot, Petr Baudis Jakub Narebski <jnareb@gmail.com> writes: > Make use of new feature of chop_str to better cut matched string and > its context in match info for searching commit messages (commit > search), as proposed by Junio C Hamano. For example, if you are > looking for "very long ... and how" in the first paragraph of this message > (if it were all on a single line), you would now see: > > ...st this with <<very long ... and how>> the actual out... > > instead of: > > Could som... <<very long search stri...>> the actual out... > > (where <<something>> denotes emphasized / colored fragment). This part needs rewritten; the first paragraph of what message is that? Also I think the subject is wrong. Yes, it is adding an option to an internal subroutine but who cares? The net effect the "gitweb" users see is that the way the grep result is shown differently, hopefully in a more understandable way, and that change is not _optional_ at all. The code looks easier to read than before, but I may be partial ;-) ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC/PATCH v3] gitweb: Better cutting matched string and its context 2008-02-25 1:46 ` Junio C Hamano @ 2008-02-25 20:07 ` Jakub Narebski 2008-02-25 20:18 ` Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: Jakub Narebski @ 2008-02-25 20:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jean-Baptiste Quenot, Petr Baudis On Mon, 25 Feb 2008, Junio C Hamano wrote: > Jakub Narebski <jnareb@gmail.com> writes: > >> Make use of new feature of chop_str to better cut matched string and >> its context in match info for searching commit messages (commit >> search), as proposed by Junio C Hamano. For example, if you are >> looking for "very long ... and how" in the first paragraph of this message >> (if it were all on a single line), you would now see: >> >> ...st this with <<very long ... and how>> the actual out... >> >> instead of: >> >> Could som... <<very long search stri...>> the actual out... >> >> (where <<something>> denotes emphasized / colored fragment). > > This part needs rewritten; the first paragraph of what message is that? Ooops. Sorry about that. I have forgot to transport context when doing copy'n'paste. BTW. the example was subtly wrong: you can search _lines_, like grep, you cannot search multiline. > Also I think the subject is wrong. Yes, it is adding an option > to an internal subroutine but who cares? The net effect the > "gitweb" users see is that the way the grep result is shown > differently, hopefully in a more understandable way, and that > change is not _optional_ at all. Below there is corrected commit message (rewritten to match code). > The code looks easier to read than before, but I may be partial > ;-) Further improved, IMVHO. BTW. I have queued 3-patch series further improving gitweb search, to be sent soon... ..................................................................... -- >8 -- From: Jakub Narebski <jnareb@gmail.com> Subject: [PATCH] gitweb: Better cutting matched string and its context Improve look of commit search output ('search' view) by better cutting of matched string and its context in match info, as proposed by Junio Hamano. For example, if you are looking for "very long search string" in the following line: Could somebody test this with very long search string, and see how you would now see: ...this with <<very long ... string>>, and see... instead of: Could som... <<very long search...>>, and see... (where <<something>> denotes emphasized / colored fragment; matched fragment to be more exact). For this feature support for fourth [optional] parameter to chop_str subroutine was added. This fourth parameter is used to denote where to cut string to make it shorter. chop_str can now cut at the beginning (from the _left_ side of the string), in the middle (_center_ of the string), or at the end (from the _right_ side of the string); cutting from right is the default: chop_str(somestring, len, slop, 'left') -> ' ...string' chop_str(somestring, len, slop, 'center') -> 'som ... ing' chop_str(somestring, len, slop, 'right') -> 'somestr... ' If you want to use default slop (default additional length), use undef as value for third parameter to chop_str. While at it, return from chop_str early if given string is so short that chop_str couldn't shorten it. Simplify also regexp used by chop_str. Make ellipsis (dots) stick to shortened fragment for cutting at ends, to better see which part got shortened. Simplify passing all arguments to chop_str in chop_and_escape_str subroutine. This was needed to pass additional options to chop_str. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- gitweb/gitweb.perl | 73 ++++++++++++++++++++++++++++++++++++++++----------- 1 files changed, 57 insertions(+), 16 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index e8226b1..fc95e2c 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -848,32 +848,73 @@ sub project_in_list { ## ---------------------------------------------------------------------- ## HTML aware string manipulation +# Try to chop given string on a word boundary between position +# $len and $len+$add_len. If there is no word boundary there, +# chop at $len+$add_len. Do not chop if chopped part plus ellipsis +# (marking chopped part) would be longer than given string. sub chop_str { my $str = shift; my $len = shift; my $add_len = shift || 10; + my $where = shift || 'right'; # 'left' | 'center' | 'right' # allow only $len chars, but don't cut a word if it would fit in $add_len # if it doesn't fit, cut it if it's still longer than the dots we would add - $str =~ m/^(.{0,$len}[^ \/\-_:\.@]{0,$add_len})(.*)/; - my $body = $1; - my $tail = $2; - if (length($tail) > 4) { - $tail = " ..."; - $body =~ s/&[^;]*$//; # remove chopped character entities + # remove chopped character entities entirely + + # when chopping in the middle, distribute $len into left and right part + # return early if chopping wouldn't make string shorter + if ($where eq 'center') { + return $str if ($len + 5 >= length($str)); # filler is length 5 + $len = int($len/2); + } else { + return $str if ($len + 4 >= length($str)); # filler is length 4 + } + + # regexps: ending and beginning with word part up to $add_len + my $endre = qr/.{$len}\w{0,$add_len}/; + my $begre = qr/\w{0,$add_len}.{$len}/; + + if ($where eq 'left') { + $str =~ m/^(.*?)($begre)$/; + my ($lead, $body) = ($1, $2); + if (length($lead) > 4) { + $body =~ s/^[^;]*;// if ($lead =~ m/&[^;]*$/); + $lead = " ..."; + } + return "$lead$body"; + + } elsif ($where eq 'center') { + $str =~ m/^($endre)(.*)$/; + my ($left, $str) = ($1, $2); + $str =~ m/^(.*?)($begre)$/; + my ($mid, $right) = ($1, $2); + if (length($mid) > 5) { + $left =~ s/&[^;]*$//; + $right =~ s/^[^;]*;// if ($mid =~ m/&[^;]*$/); + $mid = " ... "; + } + return "$left$mid$right"; + + } else { + $str =~ m/^($endre)(.*)$/; + my $body = $1; + my $tail = $2; + if (length($tail) > 4) { + $body =~ s/&[^;]*$//; + $tail = "... "; + } + return "$body$tail"; } - return "$body$tail"; } # takes the same arguments as chop_str, but also wraps a <span> around the # result with a title attribute if it does get chopped. Additionally, the # string is HTML-escaped. sub chop_and_escape_str { - my $str = shift; - my $len = shift; - my $add_len = shift || 10; + my ($str) = @_; - my $chopped = chop_str($str, $len, $add_len); + my $chopped = chop_str(@_); if ($chopped eq $str) { return esc_html($chopped); } else { @@ -3791,11 +3832,11 @@ sub git_search_grep_body { foreach my $line (@$comment) { if ($line =~ m/^(.*)($search_regexp)(.*)$/i) { my ($lead, $match, $trail) = ($1, $2, $3); - $match = chop_str($match, 70, 5); # in case match is very long - my $contextlen = int((80 - length($match))/2); # for the remainder - $contextlen = 30 if ($contextlen > 30); # but not too much - $lead = chop_str($lead, $contextlen, 10); - $trail = chop_str($trail, $contextlen, 10); + $match = chop_str($match, 70, 5, 'center'); + my $contextlen = int((80 - length($match))/2); + $contextlen = 30 if ($contextlen > 30); + $lead = chop_str($lead, $contextlen, 10, 'left'); + $trail = chop_str($trail, $contextlen, 10, 'right'); $lead = esc_html($lead); $match = esc_html($match); -- 1.5.4.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC/PATCH v3] gitweb: Better cutting matched string and its context 2008-02-25 20:07 ` [RFC/PATCH v3] gitweb: Better cutting matched string and its context Jakub Narebski @ 2008-02-25 20:18 ` Junio C Hamano 0 siblings, 0 replies; 16+ messages in thread From: Junio C Hamano @ 2008-02-25 20:18 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Jean-Baptiste Quenot, Petr Baudis Jakub Narebski <jnareb@gmail.com> writes: > Ooops. Sorry about that. I have forgot to transport context when > doing copy'n'paste. > ... > BTW. the example was subtly wrong: you can search _lines_, like grep, > you cannot search multiline. Yeah, the original had: For example, if you are looking for "very long ... and how" in the first paragraph of message (if it were all on a single line), wouldn't you want to see: and you dropped "if it were all on a single line" part, as you forgot to transport context when doing copy-n-paste ;-) > Below there is corrected commit message (rewritten to match code). Thanks. Will take a look later; I am at day-job today. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] gitweb: Better chopping in commit search results 2008-02-22 16:33 ` [PATCH] gitweb: Better chopping in commit search results Jakub Narebski 2008-02-22 17:14 ` Junio C Hamano @ 2008-02-23 9:27 ` Karl Hasselström 2008-02-23 10:20 ` Jakub Narebski 1 sibling, 1 reply; 16+ messages in thread From: Karl Hasselström @ 2008-02-23 9:27 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Jean-Baptiste Quenot, Junio C Hamano On 2008-02-22 17:33:47 +0100, Jakub Narebski wrote: > Strange that StGit always sends patches (stg mail) as if repo owner > was their author, regardless of path/commit author (I think; unless > "stg edit" cannot change authorship). It'll always put the sender in the From: header in the mail, obviously. But the patch mail template contains a "%(fromauth)s" string, which will be replaced by "From: Real Author <ra@example.com>\n", or the empty string if the patch author is the same as the sender. Your problem might be that you're using an old template. -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] gitweb: Better chopping in commit search results 2008-02-23 9:27 ` [PATCH] gitweb: Better chopping in commit search results Karl Hasselström @ 2008-02-23 10:20 ` Jakub Narebski 0 siblings, 0 replies; 16+ messages in thread From: Jakub Narebski @ 2008-02-23 10:20 UTC (permalink / raw) To: Karl Hasselström; +Cc: git On Sat, 23 Feb 2008, Karl Hasselström wrote: > On 2008-02-22 17:33:47 +0100, Jakub Narebski wrote: > > > Strange that StGit always sends patches (stg mail) as if repo owner > > was their author, regardless of path/commit author (I think; unless > > "stg edit" cannot change authorship). > > It'll always put the sender in the From: header in the mail, > obviously. But the patch mail template contains a "%(fromauth)s" > string, which will be replaced by "From: Real Author > <ra@example.com>\n", or the empty string if the patch author is the > same as the sender. > > Your problem might be that you're using an old template. Thanks a lot. That was it. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2008-02-25 20:19 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-02-13 17:37 [PATCH] Do not chop HTML tags in commit search result Jean-Baptiste Quenot 2008-02-13 19:16 ` Jakub Narebski 2008-02-13 19:43 ` Junio C Hamano 2008-02-22 16:33 ` [PATCH] gitweb: Better chopping in commit search results Jakub Narebski 2008-02-22 17:14 ` Junio C Hamano 2008-02-22 17:49 ` Jakub Narebski 2008-02-22 19:14 ` Jakub Narebski 2008-02-23 21:44 ` [RFC/PATCH] gitweb: Option to chop at beginning and in the middle in chop_str Jakub Narebski 2008-02-23 22:04 ` [PATCH] gitweb: Better chopping in commit search results Junio C Hamano 2008-02-23 23:36 ` Jakub Narebski 2008-02-24 13:01 ` [RFC/PATCH v2] gitweb: Option to chop at beginning and in the middle in chop_str Jakub Narebski 2008-02-25 1:46 ` Junio C Hamano 2008-02-25 20:07 ` [RFC/PATCH v3] gitweb: Better cutting matched string and its context Jakub Narebski 2008-02-25 20:18 ` Junio C Hamano 2008-02-23 9:27 ` [PATCH] gitweb: Better chopping in commit search results Karl Hasselström 2008-02-23 10:20 ` Jakub Narebski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).