git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 0/8] gitweb: Faster and improved project search
@ 2012-02-15 20:38 Jakub Narebski
  2012-02-15 20:38 ` [PATCHv2 1/8] gitweb: Refactor checking if part of project info need filling Jakub Narebski
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Jakub Narebski @ 2012-02-15 20:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski

[Cc-ing Junio because of his involvement in discussion about first
 patch in previous version of this series.]

First three patches in this series are mainly about speeding up
project search (and perhaps in the future also project pagination).
Well, first one is unification, refactoring and future-proofing.
The second and third patch could be squashed together; second adds
@fill_only, but third actually uses it.

Next set of patches is about highlighting matched part, making it
easier to recognize why project was selected, what we were searching
for (though better page title would also help second issue).

Well, fourth patch (first in set mentioned above) is here for the
commit message, otherwise it could have been squashed with next one.

Last patch in this series is beginning of using esc_html_match_hl()
for other searches in gitweb -- the easiest part.

Jakub Narebski (8):
  gitweb: Refactor checking if part of project info need filling
  gitweb: Option for filling only specified info in
    fill_project_list_info
  gitweb: Faster project search
  gitweb: Introduce esc_html_hl_regions
  gitweb: Highlight matched part of project name when searching
    projects
  gitweb: Highlight matched part of project description when searching
    projects
  gitweb: Highlight matched part of shortened project description
  gitweb: Use esc_html_match_hl() in 'grep' search

 gitweb/gitweb.perl |  158 ++++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 135 insertions(+), 23 deletions(-)

-- 
1.7.9

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCHv2 1/8] gitweb: Refactor checking if part of project info need filling
  2012-02-15 20:38 [PATCHv2 0/8] gitweb: Faster and improved project search Jakub Narebski
@ 2012-02-15 20:38 ` Jakub Narebski
  2012-02-15 20:38 ` [PATCHv2 2/8] gitweb: Option for filling only specified info in fill_project_list_info Jakub Narebski
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2012-02-15 20:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski

Extract the check if given keys (given parts) of project info needs to
be filled into project_info_needs_filling() subroutine.  It is for now
a thin wrapper around "!exists $project_info->{$key}".

Note that "!defined" was replaced by more correct "!exists".

While at it uniquify treating of all project info, adding checks for
'age' field before running git_get_last_activity(), and also checking
for all keys filled in code protected by conditional, and not only
one.

The code now looks like this

  foreach my $project (@$project_list) {
  	if (given keys need to be filled) {
  		fill given keys
  	}
  	...
  }

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch is new in this series, and splits off introduction of
project_info_needs_filling() function from partial filling via
@fill_only.

Hopefully this makes change more clear.

Note that this change stands alone, regardless of speeding up project
search.

 gitweb/gitweb.perl |   35 +++++++++++++++++++++++++++--------
 1 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 057ba5b..e62c2ef 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -5195,6 +5195,21 @@ sub git_project_search_form {
 	print "</div>\n";
 }
 
+# entry for given @keys needs filling if at least one of keys in list
+# is not present in %$project_info
+sub project_info_needs_filling {
+	my ($project_info, @keys) = @_;
+
+	# return List::MoreUtils::any { !exists $project_info->{$_} } @keys;
+	foreach my $key (@keys) {
+		if (!exists $project_info->{$key}) {
+			return 1;
+		}
+	}
+	return;
+}
+
+
 # fills project list info (age, description, owner, category, forks)
 # for each project in the list, removing invalid projects from
 # returned list
@@ -5206,24 +5221,28 @@ sub fill_project_list_info {
 	my $show_ctags = gitweb_check_feature('ctags');
  PROJECT:
 	foreach my $pr (@$projlist) {
-		my (@activity) = git_get_last_activity($pr->{'path'});
-		unless (@activity) {
-			next PROJECT;
+		if (project_info_needs_filling($pr, 'age', 'age_string')) {
+			my (@activity) = git_get_last_activity($pr->{'path'});
+			unless (@activity) {
+				next PROJECT;
+			}
+			($pr->{'age'}, $pr->{'age_string'}) = @activity;
 		}
-		($pr->{'age'}, $pr->{'age_string'}) = @activity;
-		if (!defined $pr->{'descr'}) {
+		if (project_info_needs_filling($pr, 'descr', 'descr_long')) {
 			my $descr = git_get_project_description($pr->{'path'}) || "";
 			$descr = to_utf8($descr);
 			$pr->{'descr_long'} = $descr;
 			$pr->{'descr'} = chop_str($descr, $projects_list_description_width, 5);
 		}
-		if (!defined $pr->{'owner'}) {
+		if (project_info_needs_filling($pr, 'owner')) {
 			$pr->{'owner'} = git_get_project_owner("$pr->{'path'}") || "";
 		}
-		if ($show_ctags) {
+		if ($show_ctags &&
+		    project_info_needs_filling($pr, 'ctags')) {
 			$pr->{'ctags'} = git_get_project_ctags($pr->{'path'});
 		}
-		if ($projects_list_group_categories && !defined $pr->{'category'}) {
+		if ($projects_list_group_categories &&
+		    project_info_needs_filling($pr, 'category')) {
 			my $cat = git_get_project_category($pr->{'path'}) ||
 			                                   $project_list_default_category;
 			$pr->{'category'} = to_utf8($cat);
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv2 2/8] gitweb: Option for filling only specified info in fill_project_list_info
  2012-02-15 20:38 [PATCHv2 0/8] gitweb: Faster and improved project search Jakub Narebski
  2012-02-15 20:38 ` [PATCHv2 1/8] gitweb: Refactor checking if part of project info need filling Jakub Narebski
@ 2012-02-15 20:38 ` Jakub Narebski
  2012-02-20  6:58   ` Junio C Hamano
  2012-02-15 20:38 ` [PATCHv2 3/8] gitweb: Faster project search Jakub Narebski
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Jakub Narebski @ 2012-02-15 20:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski

Introduce project_info_needs_filling($pr, $key[, \%fill_only]), which
is now used in place of simple 'defined $pr->{$key}' to check if
specific slot in project needs to be filled.

This is in preparation of future lazy filling of project info in
project search and pagination of sorted list of projects.  The only
functional change is that fill_project_list_info() now checks if 'age'
is already filled before running git_get_last_activity().

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This could have been squashed with the next commit, but this way it is
pure refactoring that shouldn't change gitweb behavior.

Changes from v1:
* Introduction of project_info_needs_filling() was moved to separate
  commit; this one just adds @fill_only / %fill_only, without actually 
  using it.

* It uses \%fill_only (reference to hash) rather than @fill_only,
  because project_info_needs_filling() now supports multiple keys,
  and because checking key in hash is faster O(1) than checking every
  element in array O(n).

  Though this shouldn't matter much, as @fill_only has at most two
  or three elements, as we would see in the next commit.

 gitweb/gitweb.perl |   34 +++++++++++++++++++++++-----------
 1 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e62c2ef..cae71f5 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -5195,54 +5195,66 @@ sub git_project_search_form {
 	print "</div>\n";
 }
 
-# entry for given @keys needs filling if at least one of keys in list
-# is not present in %$project_info
+
+# entry for given @keys needs filling if at least one of interesting keys
+# in list is not present in %$project_info; key is interesting if $fill_only
+# is not passed, or is empty (all keys are interesting in both of those cases),
+# or if key is in $fill_only hash
+#
+# USAGE:
+# * project_info_needs_filling($project_info, 'key', ...)
+# * project_info_needs_filling($project_info, 'key', ..., \%fill_only)
+#   where %fill_only = map { $_ => 1 } @fill_only;
 sub project_info_needs_filling {
+	my $fill_only = ref($_[-1]) ? pop : undef;
 	my ($project_info, @keys) = @_;
 
 	# return List::MoreUtils::any { !exists $project_info->{$_} } @keys;
 	foreach my $key (@keys) {
-		if (!exists $project_info->{$key}) {
+		if ((!$fill_only || !%$fill_only || $fill_only->{$key}) &&
+		    !exists $project_info->{$key}) {
 			return 1;
 		}
 	}
 	return;
 }
 
-
 # fills project list info (age, description, owner, category, forks)
 # for each project in the list, removing invalid projects from
-# returned list
+# returned list, or fill only specified info (removing invalid projects
+# only when filling 'age').
+#
 # NOTE: modifies $projlist, but does not remove entries from it
 sub fill_project_list_info {
-	my $projlist = shift;
+	my ($projlist, @fill_only) = @_;
+	my %fill_only = map { $_ => 1 } @fill_only;
 	my @projects;
 
 	my $show_ctags = gitweb_check_feature('ctags');
  PROJECT:
 	foreach my $pr (@$projlist) {
-		if (project_info_needs_filling($pr, 'age', 'age_string')) {
+		if (project_info_needs_filling($pr, 'age', 'age_string', \%fill_only)) {
 			my (@activity) = git_get_last_activity($pr->{'path'});
 			unless (@activity) {
 				next PROJECT;
 			}
 			($pr->{'age'}, $pr->{'age_string'}) = @activity;
 		}
-		if (project_info_needs_filling($pr, 'descr', 'descr_long')) {
+		if (project_info_needs_filling($pr, 'descr', 'descr_long', \%fill_only)) {
 			my $descr = git_get_project_description($pr->{'path'}) || "";
 			$descr = to_utf8($descr);
 			$pr->{'descr_long'} = $descr;
 			$pr->{'descr'} = chop_str($descr, $projects_list_description_width, 5);
 		}
-		if (project_info_needs_filling($pr, 'owner')) {
+		if (project_info_needs_filling($pr, 'owner', \%fill_only)) {
 			$pr->{'owner'} = git_get_project_owner("$pr->{'path'}") || "";
 		}
 		if ($show_ctags &&
-		    project_info_needs_filling($pr, 'ctags')) {
+		    project_info_needs_filling($pr, 'ctags', \%fill_only)) {
 			$pr->{'ctags'} = git_get_project_ctags($pr->{'path'});
 		}
 		if ($projects_list_group_categories &&
-		    project_info_needs_filling($pr, 'category')) {
+		    project_info_needs_filling($pr, 'category', \%fill_only)) {
 			my $cat = git_get_project_category($pr->{'path'}) ||
 			                                   $project_list_default_category;
 			$pr->{'category'} = to_utf8($cat);
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv2 3/8] gitweb: Faster project search
  2012-02-15 20:38 [PATCHv2 0/8] gitweb: Faster and improved project search Jakub Narebski
  2012-02-15 20:38 ` [PATCHv2 1/8] gitweb: Refactor checking if part of project info need filling Jakub Narebski
  2012-02-15 20:38 ` [PATCHv2 2/8] gitweb: Option for filling only specified info in fill_project_list_info Jakub Narebski
@ 2012-02-15 20:38 ` Jakub Narebski
  2012-02-20  8:33   ` Junio C Hamano
  2012-02-15 20:38 ` [PATCHv2 4/8] gitweb: Introduce esc_html_hl_regions Jakub Narebski
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Jakub Narebski @ 2012-02-15 20:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski

Before searching by some field the information we search for must be
filled in.  For this fill_project_list_info() was enhanced in previous
commit to take additional parameters which part of projects info to
fill.  This way we can limit doing expensive calculations (like
running git-for-each-ref to get 'age' / "Last changed" info) only to
projects which we will show as search results.

With this commit the number of git commands used to generate search
results is 2*<matched projects> + 1, and depends on number of matched
projects rather than number of all projects (all repositories).

Note: this is 'git for-each-ref' to find last activity, and 'git config'
for each project, and 'git --version' once.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
search_projects_list() now pre-fills required parts of project info by
itself, so running fill_project_list_info() before calling it is no
longer necessary and actually you should not do it.

No changes from v1

 gitweb/gitweb.perl |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index cae71f5..cc2bd6d 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2987,6 +2987,10 @@ sub search_projects_list {
 	return @$projlist
 		unless ($tagfilter || $searchtext);
 
+	# searching projects require filling to be run before it;
+	fill_project_list_info($projlist,
+	                       $tagfilter  ? 'ctags' : (),
+	                       $searchtext ? ('path', 'descr') : ());
 	my @projects;
  PROJECT:
 	foreach my $pr (@$projlist) {
@@ -5390,12 +5394,13 @@ sub git_project_list_body {
 	# filtering out forks before filling info allows to do less work
 	@projects = filter_forks_from_projects_list(\@projects)
 		if ($check_forks);
-	@projects = fill_project_list_info(\@projects);
-	# searching projects require filling to be run before it
+	# search_projects_list pre-fills required info
 	@projects = search_projects_list(\@projects,
 	                                 'searchtext' => $searchtext,
 	                                 'tagfilter'  => $tagfilter)
 		if ($tagfilter || $searchtext);
+	# fill the rest
+	@projects = fill_project_list_info(\@projects);
 
 	$order ||= $default_projects_order;
 	$from = 0 unless defined $from;
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv2 4/8] gitweb: Introduce esc_html_hl_regions
  2012-02-15 20:38 [PATCHv2 0/8] gitweb: Faster and improved project search Jakub Narebski
                   ` (2 preceding siblings ...)
  2012-02-15 20:38 ` [PATCHv2 3/8] gitweb: Faster project search Jakub Narebski
@ 2012-02-15 20:38 ` Jakub Narebski
  2012-02-15 20:38 ` [PATCHv2 5/8] gitweb: Highlight matched part of project name when searching projects Jakub Narebski
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2012-02-15 20:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski

The esc_html_hl_regions() subroutine added in this commit is meant to
highlight in a given string a list of regions (given as a list of
[ beg, end ] pairs of positions in string), using HTML <span> element
with given class.

It is to be used in next commit for highlighting matched part in
project search.

Implementation and enhancement notes:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Currently esc_html_hl_regions() subroutine doesn't accept any
  parameters, like esc_html() does.  We might want for example to
  pass  nbsp=>1  to it.

  It can easily be done with the following code:

    my %opts = grep { ref($_) ne "ARRAY" } @sel;
    @sel     = grep { ref($_) eq "ARRAY" } @sel;

  This allow adding parameters after or before regions, e.g.:

    esc_html_hl_regions("foo bar", "mark", [ 0, 3 ], -nbsp => 1);

* esc_html_hl_regions() escapes like esc_html(); if we wanted to
  highlight with esc_path(), we could pass subroutine reference
  to now named esc_gen_hl_regions().

    esc_html_hl_regions("foo bar", "mark", \&esc_path, [ 0, 3 ]);

  Note that this way we can handle -nbsp=>1 case automatically,
  e.g.

    esc_html_hl_regions("foo bar", "mark",
                        sub { esc_html(@_, -nbsp=>1) },
                        [ 0, 3 ]);

* Alternate solution for highlighting region of a string would be to
  use the idea that strings are to be HTML-escaped, and references to
  scalars are HTML (like in the idea for generic committags).

  This would require modifying gitweb code or esc_html to get list of
  fragments, e.g.:

    esc_html(\'<span class="mark">', 'foo', \'</span>', ' bar',
             { -nbsp => 1 });

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This commit is here only for the notes in the commit message,
otherwise it could have been squashed with the next commit.

Note that we actually end up using first enhancement in notes section
of the above commit message.

This patch is new; it was not present in v1

 gitweb/gitweb.perl |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index cc2bd6d..8dcd54b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1715,6 +1715,33 @@ sub chop_and_escape_str {
 	}
 }
 
+# Highlight selected fragments of string, using given CSS class,
+# and escape HTML.  It is assumed that fragments do not overlap.
+# Regions are passed as list of pairs (array references).
+#
+# Example: esc_html_hl_regions("foobar", "mark", [ 0, 3 ]) returns
+# '<span class="mark">foo</span>bar'
+sub esc_html_hl_regions {
+	my ($str, $css_class, @sel) = @_;
+	return esc_html($str) unless @sel;
+
+	my $out = '';
+	my $pos = 0;
+
+	for my $s (@sel) {
+		$out .= esc_html(substr($str, $pos, $s->[0] - $pos))
+			if ($s->[0] - $pos > 0);
+		$out .= $cgi->span({-class => $css_class},
+		                   esc_html(substr($str, $s->[0], $s->[1] - $s->[0])));
+
+		$pos = $s->[1];
+	}
+	$out .= esc_html(substr($str, $pos))
+		if ($pos < length($str));
+
+	return $out;
+}
+
 ## ----------------------------------------------------------------------
 ## functions returning short strings
 
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv2 5/8] gitweb: Highlight matched part of project name when searching projects
  2012-02-15 20:38 [PATCHv2 0/8] gitweb: Faster and improved project search Jakub Narebski
                   ` (3 preceding siblings ...)
  2012-02-15 20:38 ` [PATCHv2 4/8] gitweb: Introduce esc_html_hl_regions Jakub Narebski
@ 2012-02-15 20:38 ` Jakub Narebski
  2012-02-15 20:38 ` [PATCHv2 6/8] gitweb: Highlight matched part of project description " Jakub Narebski
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2012-02-15 20:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski

Use newly introduced esc_html_match_hl() to escape HTML and mark match
with span element with 'match' class.  Currently only 'path' part
(i.e. project name) is highlighted; match might be on the project
description.

The code makes use of the fact that defined $search_regexp means that
there was search going on.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Introducing esc_html_match_hl() could have been put together with
introduction of esc_html_hl_regions() in previos commit.

Changes from v1:
* Main part of esc_html_match_hl() got split into esc_html_hl_regions(),
  which was introduced in previous commit.

 gitweb/gitweb.perl |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 8dcd54b..5596701 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1742,6 +1742,20 @@ sub esc_html_hl_regions {
 	return $out;
 }
 
+# highlight match (if any), and escape HTML
+sub esc_html_match_hl {
+	my ($str, $regexp) = @_;
+	return esc_html($str) unless defined $regexp;
+
+	my @matches;
+	while ($str =~ /$regexp/g) {
+		push @matches, [$-[0], $+[0]];
+	}
+	return esc_html($str) unless @matches;
+
+	return esc_html_hl_regions($str, 'match', @matches);
+}
+
 ## ----------------------------------------------------------------------
 ## functions returning short strings
 
@@ -5389,7 +5403,9 @@ sub git_project_list_rows {
 			print "</td>\n";
 		}
 		print "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
-		                        -class => "list"}, esc_html($pr->{'path'})) . "</td>\n" .
+		                        -class => "list"},
+		                       esc_html_match_hl($pr->{'path'}, $search_regexp)) .
+		      "</td>\n" .
 		      "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
 		                        -class => "list", -title => $pr->{'descr_long'}},
 		                        esc_html($pr->{'descr'})) . "</td>\n" .
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv2 6/8] gitweb: Highlight matched part of project description when searching projects
  2012-02-15 20:38 [PATCHv2 0/8] gitweb: Faster and improved project search Jakub Narebski
                   ` (4 preceding siblings ...)
  2012-02-15 20:38 ` [PATCHv2 5/8] gitweb: Highlight matched part of project name when searching projects Jakub Narebski
@ 2012-02-15 20:38 ` Jakub Narebski
  2012-02-15 20:38 ` [PATCHv3/RFC 7/8] gitweb: Highlight matched part of shortened project description Jakub Narebski
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2012-02-15 20:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski

Use esc_html_match_hl() from previous commit to mark match in the
_whole_ description when searching projects.

Currently, with this commit, when searching projects there is always
shown full description of a project, and not a shortened one (like for
ordinary projects list view), even if the match is on project name and
not project description.

Showing full description when there is match on it is useful to avoid
situation where match is in shortened, invisible part... well, perhaps
that could be solved (showing shortened description), but it would
require much more complicated code.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
The part about showing match using shortened description no longer
applies after the following patch... though it is an RFC for now,
that is why it is not mentioned in the commit message.

NOTE that we are actually always showing full description, so having
full description in "title" attribute to show on mouseover over
shortened title is no longer necessary, and should probably be
fixed... that is unless the next patch that highlights matches in
shortened description is accepted.

No changes from previous version (but see paragraph above about
possible changes).

 gitweb/gitweb.perl |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 5596701..a109ebb 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -5408,7 +5408,10 @@ sub git_project_list_rows {
 		      "</td>\n" .
 		      "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
 		                        -class => "list", -title => $pr->{'descr_long'}},
-		                        esc_html($pr->{'descr'})) . "</td>\n" .
+		                        $search_regexp
+		                        ? esc_html_match_hl($pr->{'descr_long'}, $search_regexp)
+		                        : esc_html($pr->{'descr'})) .
+		      "</td>\n" .
 		      "<td><i>" . chop_and_escape_str($pr->{'owner'}, 15) . "</i></td>\n";
 		print "<td class=\"". age_class($pr->{'age'}) . "\">" .
 		      (defined $pr->{'age_string'} ? $pr->{'age_string'} : "No commits") . "</td>\n" .
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv3/RFC 7/8] gitweb: Highlight matched part of shortened project description
  2012-02-15 20:38 [PATCHv2 0/8] gitweb: Faster and improved project search Jakub Narebski
                   ` (5 preceding siblings ...)
  2012-02-15 20:38 ` [PATCHv2 6/8] gitweb: Highlight matched part of project description " Jakub Narebski
@ 2012-02-15 20:38 ` Jakub Narebski
  2012-02-15 20:38 ` [PATCHv2 8/8] gitweb: Use esc_html_match_hl() in 'grep' search Jakub Narebski
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2012-02-15 20:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski

Previous commit make gitweb use esc_html_match_hl() to mark match in
the _whole_ description of a project when searching projects.

This commit makes gitweb highlight match in _shortened_ description,
based on match in whole description, using esc_html_match_hl_chopped()
subroutine.

If match is in removed (chopped) part, even partially, then trailing
"... " is highlighted.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
It is marked as RFC, because I am not sure if it is right way to
highlight match in shortened string, or if we better use full string,
or full string if match is in chopped part.

Changes from v2:
* Harden esc_html_match_hl_chopped() against calling with both
  $chopped and $regexp undefined (even though it wouldn't happen
  with current code).

* esc_html_match_hl_chopped() uses now esc_html_hl_regions(),
  like esc_html_match_hl() used to do.

Changes from v1:
* Instead of esc_html_match_hl_chopped() duplicating much od code in
  esc_html_match_hl(), make esc_html_match_hl() call the *_chopped()
  one with $chopped set to undef.

  Now managing highlighting of $chopped part is just a matter of
  adjusting and filtering @matches to apply to $chopped rather than
  original $str where match was performed.

  As a side issue when match span past chop point current code uses
  one selection <span class+match">foo... </span> and not two
  <span class="match">foo</span><span class="match">... </span>

 gitweb/gitweb.perl |   40 ++++++++++++++++++++++++++++++++++++++--
 1 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index a109ebb..a2e2023 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1747,11 +1747,46 @@ sub esc_html_match_hl {
 	my ($str, $regexp) = @_;
 	return esc_html($str) unless defined $regexp;
 
+	return esc_html_match_hl_chopped($str, undef, $regexp);
+}
+
+
+# highlight match (if any) of shortened string, and escape HTML
+sub esc_html_match_hl_chopped {
+	my ($str, $chopped, $regexp) = @_;
+	return esc_html(defined $chopped ? $chopped : $str) unless defined $regexp;
+
 	my @matches;
 	while ($str =~ /$regexp/g) {
 		push @matches, [$-[0], $+[0]];
 	}
-	return esc_html($str) unless @matches;
+	return esc_html(defined $chopped ? $chopped : $str) unless @matches;
+
+	# filter matches so that we mark chopped string, if it is present
+	if (defined $chopped) {
+		my $tail = "... "; # see chop_str
+		unless ($chopped =~ s/\Q$tail\E$//) {
+			$tail = '';
+		}
+		my $chop_len = length($chopped);
+		my $tail_len = length($tail);
+		my @filtered;
+
+		for my $m (@matches) {
+			if ($m->[0] > $chop_len) {
+				push @filtered, [ $chop_len, $chop_len + $tail_len ] if ($tail_len > 0);
+				last;
+			} elsif ($m->[1] > $chop_len) {
+				push @filtered, [ $m->[0], $chop_len + $tail_len ];
+				last;
+			}
+			push @filtered, $m;
+		}
+
+		# further operations are on chopped string
+		$str = $chopped . $tail;
+		@matches = @filtered;
+	}
 
 	return esc_html_hl_regions($str, 'match', @matches);
 }
@@ -5409,7 +5444,8 @@ sub git_project_list_rows {
 		      "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
 		                        -class => "list", -title => $pr->{'descr_long'}},
 		                        $search_regexp
-		                        ? esc_html_match_hl($pr->{'descr_long'}, $search_regexp)
+		                        ? esc_html_match_hl_chopped($pr->{'descr_long'},
+		                                                    $pr->{'descr'}, $search_regexp)
 		                        : esc_html($pr->{'descr'})) .
 		      "</td>\n" .
 		      "<td><i>" . chop_and_escape_str($pr->{'owner'}, 15) . "</i></td>\n";
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv2 8/8] gitweb: Use esc_html_match_hl() in 'grep' search
  2012-02-15 20:38 [PATCHv2 0/8] gitweb: Faster and improved project search Jakub Narebski
                   ` (6 preceding siblings ...)
  2012-02-15 20:38 ` [PATCHv3/RFC 7/8] gitweb: Highlight matched part of shortened project description Jakub Narebski
@ 2012-02-15 20:38 ` Jakub Narebski
  2012-02-16 20:40 ` [PATCHv2 0/8] gitweb: Faster and improved project search Junio C Hamano
  2012-02-20  7:09 ` Junio C Hamano
  9 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2012-02-15 20:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski

Use esc_html_match_hl() in git_search_files() i.e. subroutine that
implements "grep" search, instead of custom code which highlighted
only one, last match.

This required enhancing esc_html_match_hl() to accept -nbsp=>1 and
pass it down to esc_html().

Note that line is untabified (tabs turned into spaces) before
highlighting match, which means that highlighting won't work e.g. for
matching tab character "\t" explicitly; but this issue was present
before this commit, and is not that easy to fix.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch was not present in previous (v1) version of this patch
series.

The 'grep' search was chosen from other searches because of the
following reasons:
* 'pickaxe' search does not show matches in diff, only filenames.
* 'commit' search shortens leading text at beginning, one last match
  in the middle, and trailing text at the end; anyway I think this
  search should be rewritten to show just "log"-like view with match
  highlighting.

 gitweb/gitweb.perl |   34 ++++++++++++++--------------------
 1 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index a2e2023..36a8cca 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1723,20 +1723,22 @@ sub chop_and_escape_str {
 # '<span class="mark">foo</span>bar'
 sub esc_html_hl_regions {
 	my ($str, $css_class, @sel) = @_;
-	return esc_html($str) unless @sel;
+	my %opts = grep { ref($_) ne 'ARRAY' } @sel;
+	@sel     = grep { ref($_) eq 'ARRAY' } @sel;
+	return esc_html($str, %opts) unless @sel;
 
 	my $out = '';
 	my $pos = 0;
 
 	for my $s (@sel) {
-		$out .= esc_html(substr($str, $pos, $s->[0] - $pos))
+		$out .= esc_html(substr($str, $pos, $s->[0] - $pos), %opts)
 			if ($s->[0] - $pos > 0);
 		$out .= $cgi->span({-class => $css_class},
-		                   esc_html(substr($str, $s->[0], $s->[1] - $s->[0])));
+		                   esc_html(substr($str, $s->[0], $s->[1] - $s->[0]), %opts));
 
 		$pos = $s->[1];
 	}
-	$out .= esc_html(substr($str, $pos))
+	$out .= esc_html(substr($str, $pos), %opts)
 		if ($pos < length($str));
 
 	return $out;
@@ -1744,23 +1746,23 @@ sub esc_html_hl_regions {
 
 # highlight match (if any), and escape HTML
 sub esc_html_match_hl {
-	my ($str, $regexp) = @_;
-	return esc_html($str) unless defined $regexp;
+	my ($str, $regexp, %opts) = @_;
+	return esc_html($str, %opts) unless defined $regexp;
 
-	return esc_html_match_hl_chopped($str, undef, $regexp);
+	return esc_html_match_hl_chopped($str, undef, $regexp, %opts);
 }
 
 
 # highlight match (if any) of shortened string, and escape HTML
 sub esc_html_match_hl_chopped {
-	my ($str, $chopped, $regexp) = @_;
-	return esc_html(defined $chopped ? $chopped : $str) unless defined $regexp;
+	my ($str, $chopped, $regexp, %opts) = @_;
+	return esc_html(defined $chopped ? $chopped : $str, %opts) unless defined $regexp;
 
 	my @matches;
 	while ($str =~ /$regexp/g) {
 		push @matches, [$-[0], $+[0]];
 	}
-	return esc_html(defined $chopped ? $chopped : $str) unless @matches;
+	return esc_html(defined $chopped ? $chopped : $str, %opts) unless @matches;
 
 	# filter matches so that we mark chopped string, if it is present
 	if (defined $chopped) {
@@ -1788,7 +1790,7 @@ sub esc_html_match_hl_chopped {
 		@matches = @filtered;
 	}
 
-	return esc_html_hl_regions($str, 'match', @matches);
+	return esc_html_hl_regions($str, 'match', @matches, %opts);
 }
 
 ## ----------------------------------------------------------------------
@@ -6070,15 +6072,7 @@ sub git_search_files {
 			print "<div class=\"binary\">Binary file</div>\n";
 		} else {
 			$ltext = untabify($ltext);
-			if ($ltext =~ m/^(.*)($search_regexp)(.*)$/i) {
-				$ltext = esc_html($1, -nbsp=>1);
-				$ltext .= '<span class="match">';
-				$ltext .= esc_html($2, -nbsp=>1);
-				$ltext .= '</span>';
-				$ltext .= esc_html($3, -nbsp=>1);
-			} else {
-				$ltext = esc_html($ltext, -nbsp=>1);
-			}
+			$ltext = esc_html_match_hl($ltext, qr/$search_regexp/i, -nbsp=>1);
 			print "<div class=\"pre\">" .
 				$cgi->a({-href => $file_href.'#l'.$lno,
 				        -class => "linenr"}, sprintf('%4i', $lno)) .
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCHv2 0/8] gitweb: Faster and improved project search
  2012-02-15 20:38 [PATCHv2 0/8] gitweb: Faster and improved project search Jakub Narebski
                   ` (7 preceding siblings ...)
  2012-02-15 20:38 ` [PATCHv2 8/8] gitweb: Use esc_html_match_hl() in 'grep' search Jakub Narebski
@ 2012-02-16 20:40 ` Junio C Hamano
  2012-02-16 21:53   ` Jakub Narebski
  2012-02-20  7:09 ` Junio C Hamano
  9 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2012-02-16 20:40 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> [Cc-ing Junio because of his involvement in discussion about first
>  patch in previous version of this series.]
>
> First three patches in this series are mainly about speeding up
> project search (and perhaps in the future also project pagination).
> Well, first one is unification, refactoring and future-proofing.
> The second and third patch could be squashed together; second adds
> @fill_only, but third actually uses it.
>
> Next set of patches is about highlighting matched part, making it
> easier to recognize why project was selected, what we were searching
> for (though better page title would also help second issue).
>
> Well, fourth patch (first in set mentioned above) is here for the
> commit message, otherwise it could have been squashed with next one.
>
> Last patch in this series is beginning of using esc_html_match_hl()
> for other searches in gitweb -- the easiest part.

Notice that you never said anything about what you wanted to achieve with
this entire series?  " -- the easiest part." does not mean anything.
The easiest part of what?

Where in the series do the "faster" and "improved" come from?  What do you
exactly mean by "faster" and "improved"?  In which commit would we find
the answers to the questions like:

 - What operation was slow and how you tackled that slowness?
 - What are the benchmark results?

In general, "improve" is such a loaded word (after all, patches sent to
the list are almost always meant to "improve" things) that it almost does
not convey a single bit of information.  Are you fixing bugs?  Are you
tidying up an unreadable piece of code?  Are you fixing styles?  Are you
producing prettier output?  Are you refactoring cut-and-pasted repetition
into a common helper function?  Are you adding a new feature?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv2 0/8] gitweb: Faster and improved project search
  2012-02-16 20:40 ` [PATCHv2 0/8] gitweb: Faster and improved project search Junio C Hamano
@ 2012-02-16 21:53   ` Jakub Narebski
  0 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2012-02-16 21:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, 16 Feb 2012, Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
> > [Cc-ing Junio because of his involvement in discussion about first
> >  patch in previous version of this series.]
> >
> > First three patches in this series are mainly about speeding up
> > project search (and perhaps in the future also project pagination).
> > Well, first one is unification, refactoring and future-proofing.
> > The second and third patch could be squashed together; second adds
> > @fill_only, but third actually uses it.
> >
> > Next set of patches is about highlighting matched part, making it
> > easier to recognize why project was selected, what we were searching
> > for (though better page title would also help second issue).
> >
> > Well, fourth patch (first in set mentioned above) is here for the
> > commit message, otherwise it could have been squashed with next one.
> >
> > Last patch in this series is beginning of using esc_html_match_hl()
> > for other searches in gitweb -- the easiest part.
> 
> Notice that you never said anything about what you wanted to achieve with
> this entire series?  " -- the easiest part." does not mean anything.
> The easiest part of what?

Gitweb, besides project search, supports searching in a give repository
of a commit message ('commit', 'author', 'committer'), of files ('grep')
and of changes ('pickaxe').  From those 'grep' was easiest.  This is
mentioned in comments in 8/8.

What I'd like to achieve is unification of match highlighting code, which
reduces code duplication.  esc_html_match_hl() is also better than current
hand-written code: it highlights all matches, and not only last match.

> Where in the series do the "faster" and "improved" come from?  What do you
> exactly mean by "faster" and "improved"?  In which commit would we find
> the answers to the questions like:
> 
>  - What operation was slow and how you tackled that slowness?
>  - What are the benchmark results?

Those series make project search faster because they reduce number of
git commands that gitweb runs when generating project search results.
Before number of commands was 2*<number of projects> + 1, after 3/8
it is 2*<number of results> + 1.


For search that selects 2 repositories out of 12 total

* Before (warm cache):

  "This page took 0.867151 seconds  and 27 git commands to generate."

* After (warm cache):
 
  "This page took 0.673643 seconds  and 5 git commands to generate."

The improvement in performance is even better for larger number of
repositories, and for cold cache / evicted cache.


Should I do benchmark of git_project_list_body() subroutine before
and after?

> In general, "improve" is such a loaded word (after all, patches sent to
> the list are almost always meant to "improve" things) that it almost does
> not convey a single bit of information.  Are you fixing bugs?  Are you
> tidying up an unreadable piece of code?  Are you fixing styles?  Are you
> producing prettier output?  Are you refactoring cut-and-pasted repetition
> into a common helper function?  Are you adding a new feature?

I'm sorry about vague "improved".  What I meant is that I have added
highlighting of matched part in project search, just like e.g. Google
does unobtrusive highlighting of search phrase in page snippets.

This is especially important because project search matches either
project name or project description, and it might be not obvious why
the match... especially that match on description might be in shortened
chopped-out part.


I'll improve commit messages and cover letter in a re-roll.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv2 2/8] gitweb: Option for filling only specified info in fill_project_list_info
  2012-02-15 20:38 ` [PATCHv2 2/8] gitweb: Option for filling only specified info in fill_project_list_info Jakub Narebski
@ 2012-02-20  6:58   ` Junio C Hamano
  2012-02-22 22:05     ` Jakub Narebski
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2012-02-20  6:58 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Junio C Hamano

Jakub Narebski <jnareb@gmail.com> writes:

> Introduce project_info_needs_filling($pr, $key[, \%fill_only]), which

That's "project_info_needs_filling($pr, @key[, \%fill_only])", isn't it?

> +# entry for given @keys needs filling if at least one of interesting keys
> +# in list is not present in %$project_info; key is interesting if $fill_only
> +# is not passed, or is empty (all keys are interesting in both of those cases),
> +# or if key is in $fill_only hash
> +#
> +# USAGE:
> +# * project_info_needs_filling($project_info, 'key', ...)
> +# * project_info_needs_filling($project_info, 'key', ..., \%fill_only)
> +#   where %fill_only = map { $_ => 1 } @fill_only;

The reason this spells bad to me is that it gives the readers (and anybody
who may want to touch gitweb code) that the caller has _two_ ways to say
the same thing.

If a caller is interested in finding out $key in $project_info, it can
either (1) pass $key as one of the earlier parameters to the function and
not pass any \%fill_only, or (2) pass $key and pass \%fill_only but after
making sure \%fill_only also has $key defined.

And if the caller passes \%fill_only, it has to remember that it needs to
add $key to it; otherwise the earlier request "I want $key" is ignored by
the function.

Why is it a good thing to have such a convoluted API?

Is it that "fill_project_list_info" is called by multiple callers, and
they do not necessarily need to see all the fields that can be filled by
the function (namely, age, age_string, descr, descr_long, owner, ctags and
category)?  If that is the case, I must say that the approach to put the
set filtering logic inside project_info_needs_filling function smells like
a bad API design.

In other words, wouldn't a callchain that uses a more natural set of API
functions be like this?

	# the top-level caller that is interested only in these two fields
        fill_project_list_info($projlist, 'age', 'owner');

        # in fill_project_list_info()
	my $projlist = shift;
        my %wanted = map { $_ => 1 } @_;

	foreach my $pr (@$projlist) {
		if (project_info_needs_filling($pr,
			filter_set(\%wanted, 'age', 'age_string'))) {
			... get @activity ...
                        ($pr->{'age'}, $pr->{'age_string'}) = @activity;
		}
		...
	}

That way, "needs_filling" has to do one and only one thing well: it checks
if any of the fields it was asked is missing and answers.  And a generic
set filtering helper that takes a filtering hashref and an array can be
used later outside of the "needs_filling" function.

>  sub project_info_needs_filling {
> +	my $fill_only = ref($_[-1]) ? pop : undef;
>  	my ($project_info, @keys) = @_;
>  
>  	# return List::MoreUtils::any { !exists $project_info->{$_} } @keys;
>  	foreach my $key (@keys) {
> -		if (!exists $project_info->{$key}) {
> +		if ((!$fill_only || !%$fill_only || $fill_only->{$key}) &&
> +		    !exists $project_info->{$key}) {
>  			return 1;
>  		}
>  	}
>  	return;
>  }
>  
> -

Shouldn't the previous patch updated not to add this blank line in the
first place?

>  # fills project list info (age, description, owner, category, forks)

Is this enumeration up-to-date?  If we cannot keep it up-to-date, it may
make sense to show only a few as examples and add ellipses.

>  # for each project in the list, removing invalid projects from
> -# returned list
> +# returned list, or fill only specified info (removing invalid projects
> +# only when filling 'age').

It is unclear what the added clause wants to say; especially, the link
between the mention of 'age' and 'only when' is unclear.  Is asking of
'age' what makes a request a notable exception to remove invalid projects?
Or is asking to partially fill fields what triggers the removal of invalid
projects, and you mentioned 'age' as an example?

If it is the former (I am guessing it is), it would be much clearer to
say:

    Invalid projects are removed from the returned list if and only if you
    ask 'age' to be filled, because of such and such reasons.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv2 0/8] gitweb: Faster and improved project search
  2012-02-15 20:38 [PATCHv2 0/8] gitweb: Faster and improved project search Jakub Narebski
                   ` (8 preceding siblings ...)
  2012-02-16 20:40 ` [PATCHv2 0/8] gitweb: Faster and improved project search Junio C Hamano
@ 2012-02-20  7:09 ` Junio C Hamano
  2012-02-22 22:09   ` Jakub Narebski
  9 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2012-02-20  7:09 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> [Cc-ing Junio because of his involvement in discussion about first
>  patch in previous version of this series.]
>
> First three patches in this series are mainly about speeding up
> project search (and perhaps in the future also project pagination).
> Well, first one is unification, refactoring and future-proofing.
> The second and third patch could be squashed together; second adds
> @fill_only, but third actually uses it.

I'll queue these three separately to a topic jn/gitweb-search-optim,
and fork another topic jn/gitweb-hilite-regions from there.  I haven't
looked the latter closely, though.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv2 3/8] gitweb: Faster project search
  2012-02-15 20:38 ` [PATCHv2 3/8] gitweb: Faster project search Jakub Narebski
@ 2012-02-20  8:33   ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2012-02-20  8:33 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> Before searching by some field the information we search for must be
> filled in.  For this fill_project_list_info() was enhanced in previous
> commit to take additional parameters which part of projects info to

"must be filled in." is correct, but that happens without the previous
patch.  The first sentence must also say:

	In order to search by some field, the information we look for must
	be filled in, but we do not have to fill other fields that are not
	involved in the search.

to justify the previous "fill_project_list_info can be asked to return
without getting unused fields" patch.  The rest of the log message then
makes good sense.

> fill.  This way we can limit doing expensive calculations (like
> running git-for-each-ref to get 'age' / "Last changed" info) only to
> projects which we will show as search results.
>
> With this commit the number of git commands used to generate search
> results is 2*<matched projects> + 1, and depends on number of matched
> projects rather than number of all projects (all repositories).
>
> Note: this is 'git for-each-ref' to find last activity, and 'git config'
> for each project, and 'git --version' once.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv2 2/8] gitweb: Option for filling only specified info in fill_project_list_info
  2012-02-20  6:58   ` Junio C Hamano
@ 2012-02-22 22:05     ` Jakub Narebski
  0 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2012-02-22 22:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, 20 Feb 2012, Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
> > Introduce project_info_needs_filling($pr, $key[, \%fill_only]), which
> 
> That's "project_info_needs_filling($pr, @key[, \%fill_only])", isn't it?

Yes it is.  I'm sorry, I have missed that this needs fixing from the
first version of patch series.  Since then I have noticed that in few
cases we fill two fields at once, so we have to ask about two fields
at once as well.

> > +# entry for given @keys needs filling if at least one of interesting keys
> > +# in list is not present in %$project_info; key is interesting if $fill_only
> > +# is not passed, or is empty (all keys are interesting in both of those cases),
> > +# or if key is in $fill_only hash
> > +#
> > +# USAGE:
> > +# * project_info_needs_filling($project_info, 'key', ...)
> > +# * project_info_needs_filling($project_info, 'key', ..., \%fill_only)
> > +#   where %fill_only = map { $_ => 1 } @fill_only;
> 
> The reason this spells bad to me is that it gives the readers (and anybody
> who may want to touch gitweb code) that the caller has _two_ ways to say
> the same thing.
> 
> If a caller is interested in finding out $key in $project_info, it can
> either (1) pass $key as one of the earlier parameters to the function and
> not pass any \%fill_only, or (2) pass $key and pass \%fill_only but after
> making sure \%fill_only also has $key defined.
> 
> And if the caller passes \%fill_only, it has to remember that it needs to
> add $key to it; otherwise the earlier request "I want $key" is ignored by
> the function.

Yes, you are right.  All of this is caused by fill_project_list_info()
trying to do *two* things, instead of doing one thing and doing it well,
as is Unix philosophy.  It tried to check both that key is needed and
that key is wanted.

> Why is it a good thing to have such a convoluted API?

Thanks for a sanity check.

> Is it that "fill_project_list_info" is called by multiple callers, and
> they do not necessarily need to see all the fields that can be filled by
> the function (namely, age, age_string, descr, descr_long, owner, ctags and
> category)?  If that is the case, I must say that the approach to put the
> set filtering logic inside project_info_needs_filling function smells like
> a bad API design.

You are right.

> In other words, wouldn't a callchain that uses a more natural set of API
> functions be like this?
> 
> 	# the top-level caller that is interested only in these two fields
> 	fill_project_list_info($projlist, 'age', 'owner');
> 
> 	# in fill_project_list_info()
> 	my $projlist = shift;
> 	my %wanted = map { $_ => 1 } @_;
> 
> 	foreach my $pr (@$projlist) {
> 		if (project_info_needs_filling($pr,
> 			filter_set(\%wanted, 'age', 'age_string'))) {

That or

 		if (project_info_needs_filling($pr,  'age', 'age_string') &&
 		    project_info_is_wanted(\%wanted, 'age', 'age_string')) {

and in your case there is no duplication of field names.

filter_set() is simply intersection of two sets, but it treats empty
%wanted as a full set.

> 			... get @activity ...
>                         ($pr->{'age'}, $pr->{'age_string'}) = @activity;
> 		}
> 		...
> 	}
> 
> That way, "needs_filling" has to do one and only one thing well: it checks
> if any of the fields it was asked is missing and answers.  And a generic
> set filtering helper that takes a filtering hashref and an array can be
> used later outside of the "needs_filling" function.

Right.

> >  sub project_info_needs_filling {
> > +	my $fill_only = ref($_[-1]) ? pop : undef;
> >  	my ($project_info, @keys) = @_;
> >  
> >  	# return List::MoreUtils::any { !exists $project_info->{$_} } @keys;
> >  	foreach my $key (@keys) {
> > -		if (!exists $project_info->{$key}) {
> > +		if ((!$fill_only || !%$fill_only || $fill_only->{$key}) &&
> > +		    !exists $project_info->{$key}) {
> >  			return 1;
> >  		}
> >  	}
> >  	return;
> >  }
> >  
> > -
> 
> Shouldn't the previous patch updated not to add this blank line in the
> first place?

O.K.

> >  # fills project list info (age, description, owner, category, forks)
> 
> Is this enumeration up-to-date?  If we cannot keep it up-to-date, it may
> make sense to show only a few as examples and add ellipses.

O.K.

> >  # for each project in the list, removing invalid projects from
> > -# returned list
> > +# returned list, or fill only specified info (removing invalid projects
> > +# only when filling 'age').
> 
> It is unclear what the added clause wants to say; especially, the link
> between the mention of 'age' and 'only when' is unclear.  Is asking of
> 'age' what makes a request a notable exception to remove invalid projects?
> Or is asking to partially fill fields what triggers the removal of invalid
> projects, and you mentioned 'age' as an example?
> 
> If it is the former (I am guessing it is), it would be much clearer to
> say:
> 
>     Invalid projects are removed from the returned list if and only if you
>     ask 'age' to be filled, because of such and such reasons.
 
O.K.


Sidenote: the reason is that to check that project is valid you really
have to run git command that requires repository in it[1].  Only when
filling 'age' and 'age_string' using git_get_last_activity() we always
run git command.

[1] Gitweb actually uses --git-dir=<PATH> parameter to git wrapper.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv2 0/8] gitweb: Faster and improved project search
  2012-02-20  7:09 ` Junio C Hamano
@ 2012-02-22 22:09   ` Jakub Narebski
  0 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2012-02-22 22:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, 20 Feb 2012, Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
> > [Cc-ing Junio because of his involvement in discussion about first
> >  patch in previous version of this series.]
> >
> > First three patches in this series are mainly about speeding up
> > project search (and perhaps in the future also project pagination).
> > Well, first one is unification, refactoring and future-proofing.
> > The second and third patch could be squashed together; second adds
> > @fill_only, but third actually uses it.
> 
> I'll queue these three separately to a topic jn/gitweb-search-optim,
> and fork another topic jn/gitweb-hilite-regions from there.  I haven't
> looked the latter closely, though.

O.K., when rerolling the series I will resend those in separate patch
series: one for performance improvements for project search (less calls
to git commands), one for match highlighting in project search ('grep',
'commit' and other per-project searches already highlight their matches,
though in suboptimal way), and perhaps one for using esc_html_match_hl()
thorough gitweb,... though with only one patch in this series for now
perhaps it be better joined with match highlighting for project search.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-02-22 22:10 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-15 20:38 [PATCHv2 0/8] gitweb: Faster and improved project search Jakub Narebski
2012-02-15 20:38 ` [PATCHv2 1/8] gitweb: Refactor checking if part of project info need filling Jakub Narebski
2012-02-15 20:38 ` [PATCHv2 2/8] gitweb: Option for filling only specified info in fill_project_list_info Jakub Narebski
2012-02-20  6:58   ` Junio C Hamano
2012-02-22 22:05     ` Jakub Narebski
2012-02-15 20:38 ` [PATCHv2 3/8] gitweb: Faster project search Jakub Narebski
2012-02-20  8:33   ` Junio C Hamano
2012-02-15 20:38 ` [PATCHv2 4/8] gitweb: Introduce esc_html_hl_regions Jakub Narebski
2012-02-15 20:38 ` [PATCHv2 5/8] gitweb: Highlight matched part of project name when searching projects Jakub Narebski
2012-02-15 20:38 ` [PATCHv2 6/8] gitweb: Highlight matched part of project description " Jakub Narebski
2012-02-15 20:38 ` [PATCHv3/RFC 7/8] gitweb: Highlight matched part of shortened project description Jakub Narebski
2012-02-15 20:38 ` [PATCHv2 8/8] gitweb: Use esc_html_match_hl() in 'grep' search Jakub Narebski
2012-02-16 20:40 ` [PATCHv2 0/8] gitweb: Faster and improved project search Junio C Hamano
2012-02-16 21:53   ` Jakub Narebski
2012-02-20  7:09 ` Junio C Hamano
2012-02-22 22:09   ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).