git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 0/3] gitweb: Faster project search
@ 2012-02-23 15:54 Jakub Narebski
  2012-02-23 15:54 ` [PATCHv3 1/3] gitweb: Refactor checking if part of project info need filling Jakub Narebski
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Jakub Narebski @ 2012-02-23 15:54 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski

[Cc-ing Junio because of his involvement in discussion]

These patches are separated from first part of previous version of
this series

  "[PATCHv2 0/8] gitweb: Faster and improved project search"
  http://thread.gmane.org/gmane.comp.version-control.git/190852

It is meant to replace 'jn/gitweb-search-optim' in pu

The intent of this series is to make project search faster by reducing
number of git commands (and I/O).  This is done by first searching
projects, then filling all project info, rather than the reverse.


Jakub Narebski (3):
  gitweb: Refactor checking if part of project info need filling
  gitweb: Option for filling only specified info in
    fill_project_list_info
  gitweb: Faster project search

 gitweb/gitweb.perl |   66 +++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 53 insertions(+), 13 deletions(-)

-- 
1.7.9

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCHv3 1/3] gitweb: Refactor checking if part of project info need filling
  2012-02-23 15:54 [PATCHv3 0/3] gitweb: Faster project search Jakub Narebski
@ 2012-02-23 15:54 ` Jakub Narebski
  2012-02-23 15:54 ` [PATCHv3 2/3] gitweb: Option for filling only specified info in fill_project_list_info Jakub Narebski
  2012-02-23 15:54 ` [PATCHv3 3/3] gitweb: Faster project search Jakub Narebski
  2 siblings, 0 replies; 4+ messages in thread
From: Jakub Narebski @ 2012-02-23 15:54 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski

Extract the check if given keys (given parts) of project info needs to
be filled into project_info_needs_filling() subroutine.  It is for now
a thin wrapper around "!exists $project_info->{$key}".

Note that !defined was replaced by more correct !exists.

While at it uniquify treating of all project info, adding checks for
'age' field before running git_get_last_activity(), and also checking
for all keys filled in code protected by conditional, and not only
one.

The code now looks like this

  foreach my $project (@$project_list) {
  	if (given keys need to be filled) {
  		fill given keys
  	}
  	...
  }

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch appeared for first time in v2 of this series, and split off
introduction of project_info_needs_filling() function from partial
filling via @wanted_keys... which in this version of series is not
done in project_info_needs_filling() anyway.

Almost no changes from v2:
* Remove duplication of empty line between project_info_needs_filling()
  and fill_project_list_info()

 gitweb/gitweb.perl |   34 ++++++++++++++++++++++++++--------
 1 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 057ba5b..cdd84c7 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -5195,6 +5195,20 @@ sub git_project_search_form {
 	print "</div>\n";
 }
 
+# entry for given @keys needs filling if at least one of keys in list
+# is not present in %$project_info
+sub project_info_needs_filling {
+	my ($project_info, @keys) = @_;
+
+	# return List::MoreUtils::any { !exists $project_info->{$_} } @keys;
+	foreach my $key (@keys) {
+		if (!exists $project_info->{$key}) {
+			return 1;
+		}
+	}
+	return;
+}
+
 # fills project list info (age, description, owner, category, forks)
 # for each project in the list, removing invalid projects from
 # returned list
@@ -5206,24 +5220,28 @@ sub fill_project_list_info {
 	my $show_ctags = gitweb_check_feature('ctags');
  PROJECT:
 	foreach my $pr (@$projlist) {
-		my (@activity) = git_get_last_activity($pr->{'path'});
-		unless (@activity) {
-			next PROJECT;
+		if (project_info_needs_filling($pr, 'age', 'age_string')) {
+			my (@activity) = git_get_last_activity($pr->{'path'});
+			unless (@activity) {
+				next PROJECT;
+			}
+			($pr->{'age'}, $pr->{'age_string'}) = @activity;
 		}
-		($pr->{'age'}, $pr->{'age_string'}) = @activity;
-		if (!defined $pr->{'descr'}) {
+		if (project_info_needs_filling($pr, 'descr', 'descr_long')) {
 			my $descr = git_get_project_description($pr->{'path'}) || "";
 			$descr = to_utf8($descr);
 			$pr->{'descr_long'} = $descr;
 			$pr->{'descr'} = chop_str($descr, $projects_list_description_width, 5);
 		}
-		if (!defined $pr->{'owner'}) {
+		if (project_info_needs_filling($pr, 'owner')) {
 			$pr->{'owner'} = git_get_project_owner("$pr->{'path'}") || "";
 		}
-		if ($show_ctags) {
+		if ($show_ctags &&
+		    project_info_needs_filling($pr, 'ctags')) {
 			$pr->{'ctags'} = git_get_project_ctags($pr->{'path'});
 		}
-		if ($projects_list_group_categories && !defined $pr->{'category'}) {
+		if ($projects_list_group_categories &&
+		    project_info_needs_filling($pr, 'category')) {
 			my $cat = git_get_project_category($pr->{'path'}) ||
 			                                   $project_list_default_category;
 			$pr->{'category'} = to_utf8($cat);
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCHv3 2/3] gitweb: Option for filling only specified info in fill_project_list_info
  2012-02-23 15:54 [PATCHv3 0/3] gitweb: Faster project search Jakub Narebski
  2012-02-23 15:54 ` [PATCHv3 1/3] gitweb: Refactor checking if part of project info need filling Jakub Narebski
@ 2012-02-23 15:54 ` Jakub Narebski
  2012-02-23 15:54 ` [PATCHv3 3/3] gitweb: Faster project search Jakub Narebski
  2 siblings, 0 replies; 4+ messages in thread
From: Jakub Narebski @ 2012-02-23 15:54 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski

Enhance fill_project_list_info() subroutine to accept optional
parameters that specify which fields in project information needs to
be filled.  If none are specified then fill_project_list_info()
behaves as it used to, and ensure that all project info is filled.

This is in preparation of future lazy filling of project info in
project search and pagination of sorted list of projects.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This could have been squashed with the next commit, but this way it is
pure refactoring that shouldn't change gitweb behavior, and hopefully
makes this patch easier to review.

Changes from v2:
* Instead of overloading project_info_needs_filling() to do both
  handling of which fields need to be filled, and which fields
  we want to have filled, just make its caller filter its arguments.

  This means that in this version of patch there are no changes to
  project_info_needs_filling() subroutine.

# Instead of handling both cases (the one with set of wanted keys,
  and the one where we want to fill all remaining info) in a single
  filtering subroutine, use two different anonymous subroutines
  to filter or not filter keys.

* Expanded comment about fill_project_list_info to include detailed
  description of when it does remove projects from returned list (and
  why it is), and examples for both types of usage of this subroutine.

* Rewritten major part of commit message, which in v2 was not updated
  to reflect new development, and referred to behavior in v1.

[Should I include changes from v1 here, in the future?]


On Mon, 20 Feb 2012, Junio C Hamano wrote
in http://thread.gmane.org/gmane.comp.version-control.git/190852/focus=191046
JH>
JH> I must say that the approach to put the set filtering logic inside
JH> project_info_needs_filling function smells like a bad API design.
JH>
JH> In other words, wouldn't a callchain that uses a more natural set of API
JH> functions be like this?
JH> 
JH>       # the top-level caller that is interested only in these two fields
JH>       fill_project_list_info($projlist, 'age', 'owner');
JH> 
JH>       # in fill_project_list_info()
JH>       my $projlist = shift;
JH>       my %wanted = map { $_ => 1 } @_;
JH> 
JH>       foreach my $pr (@$projlist) {
JH>               if (project_info_needs_filling($pr,
JH>                       filter_set(\%wanted, 'age', 'age_string'))) {

I think this might be an even better solution, instead of handling all
keys / selected keys in filter_set(), provide different $filter_set
subroutines:

        foreach my $pr (@$projlist) {
                  if (project_info_needs_filling($pr,
                          $filter_set->('age', 'age_string'))) {


 gitweb/gitweb.perl |   33 +++++++++++++++++++++++++--------
 1 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index cdd84c7..7fb7a55 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -5209,39 +5209,56 @@ sub project_info_needs_filling {
 	return;
 }
 
-# fills project list info (age, description, owner, category, forks)
+# fills project list info (age, description, owner, category, forks, etc.)
 # for each project in the list, removing invalid projects from
-# returned list
+# returned list, or fill only specified info.
+#
+# Invalid projects are removed from the returned list if and only if you
+# ask 'age' or 'age_string' to be filled, because they are the only fields
+# that run unconditionally git command that requires repository, and
+# therefore do always check if project repository is invalid.
+#
+# USAGE:
+# * fill_project_list_info(\@project_list, 'descr_long', 'ctags')
+#   ensures that 'descr_long' and 'ctags' fields are filled
+# * @project_list = fill_project_list_info(\@project_list)
+#   ensures that all fields are filled (and invalid projects removed)
+#
 # NOTE: modifies $projlist, but does not remove entries from it
 sub fill_project_list_info {
-	my $projlist = shift;
+	my ($projlist, @wanted_keys) = @_;
 	my @projects;
+	my $filter_set = sub { return @_; };
+	if (@wanted_keys) {
+		my %wanted_keys = map { $_ => 1 } @wanted_keys;
+		$filter_set = sub { return grep { $wanted_keys{$_} } @_; };
+	}
 
 	my $show_ctags = gitweb_check_feature('ctags');
  PROJECT:
 	foreach my $pr (@$projlist) {
-		if (project_info_needs_filling($pr, 'age', 'age_string')) {
+		if (project_info_needs_filling($pr, $filter_set->('age', 'age_string'))) {
 			my (@activity) = git_get_last_activity($pr->{'path'});
 			unless (@activity) {
 				next PROJECT;
 			}
 			($pr->{'age'}, $pr->{'age_string'}) = @activity;
 		}
-		if (project_info_needs_filling($pr, 'descr', 'descr_long')) {
+		if (project_info_needs_filling($pr, $filter_set->('descr', 'descr_long'))) {
 			my $descr = git_get_project_description($pr->{'path'}) || "";
 			$descr = to_utf8($descr);
 			$pr->{'descr_long'} = $descr;
 			$pr->{'descr'} = chop_str($descr, $projects_list_description_width, 5);
 		}
-		if (project_info_needs_filling($pr, 'owner')) {
+		if (project_info_needs_filling($pr, $filter_set->('owner'))) {
 			$pr->{'owner'} = git_get_project_owner("$pr->{'path'}") || "";
 		}
 		if ($show_ctags &&
-		    project_info_needs_filling($pr, 'ctags')) {
+		    project_info_needs_filling($pr, $filter_set->('ctags'))) {
 			$pr->{'ctags'} = git_get_project_ctags($pr->{'path'});
 		}
 		if ($projects_list_group_categories &&
-		    project_info_needs_filling($pr, 'category')) {
+		    project_info_needs_filling($pr, $filter_set->('category'))) {
 			my $cat = git_get_project_category($pr->{'path'}) ||
 			                                   $project_list_default_category;
 			$pr->{'category'} = to_utf8($cat);
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCHv3 3/3] gitweb: Faster project search
  2012-02-23 15:54 [PATCHv3 0/3] gitweb: Faster project search Jakub Narebski
  2012-02-23 15:54 ` [PATCHv3 1/3] gitweb: Refactor checking if part of project info need filling Jakub Narebski
  2012-02-23 15:54 ` [PATCHv3 2/3] gitweb: Option for filling only specified info in fill_project_list_info Jakub Narebski
@ 2012-02-23 15:54 ` Jakub Narebski
  2 siblings, 0 replies; 4+ messages in thread
From: Jakub Narebski @ 2012-02-23 15:54 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski

Before searching by some field the information we search for must be
filled in, but we do not have to fill other fields that are not
involved in the search.

To be able to request filling only specified fields,
fill_project_list_info() was enhanced in previous commit to take
additional parameters which specify part of projects info to fill.
This way we can limit doing expensive calculations (like running
git-for-each-ref to get 'age' / "Last changed" info) to doing those
only for projects which we will show as search results.

This commit actually uses this interface, changing gitweb code from
the following behavior

  fill all project info on all projects
  search projects

to behaving like this pseudocode

  fill search fields on all projects
  search projects
  fill all project info on search results


With this commit the number of git commands used to generate search
results is 2*<matched projects> + 1, and depends on number of matched
projects rather than number of all projects (all repositories).

Note: this is 'git for-each-ref' to find last activity, and 'git config'
for each project, and 'git --version' once.


Example performance improvements, for search that selects 2
repositories out of 12 in total:

* Before (warm cache):
  "This page took 0.867151 seconds  and 27 git commands to generate."

* After (warm cache):
  "This page took 0.673643 seconds  and 5 git commands to generate."

Now imagine that they are 5 repositories out of 5000, and cold or
trashed cache case.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Changes from v2 (v2 is the same as v1):
* Rewritten and extended commit message (though extending perhaps have
  gone too far...).

  Added paragraphs are "This commit actually uses this interface..."
  and "Example performance improvements..."


Junio C Hamano wrote in
http://thread.gmane.org/gmane.comp.version-control.git/190852/focus=191053
[...]
JC> "must be filled in." is correct, but that happens without the previous
JC> patch.  The first sentence must also say:
JC> 
JC>         In order to search by some field, the information we look for must
JC>         be filled in, but we do not have to fill other fields that are not
JC>         involved in the search.
JC> 
JC> to justify the previous "fill_project_list_info can be asked to return
JC> without getting unused fields" patch.  

Done.  Thanks for a suggestion.

JC> The rest of the log message then makes good sense.

Ooops...

 gitweb/gitweb.perl |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7fb7a55..4ceb1a6 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2987,6 +2987,10 @@ sub search_projects_list {
 	return @$projlist
 		unless ($tagfilter || $searchtext);
 
+	# searching projects require filling to be run before it;
+	fill_project_list_info($projlist,
+	                       $tagfilter  ? 'ctags' : (),
+	                       $searchtext ? ('path', 'descr') : ());
 	my @projects;
  PROJECT:
 	foreach my $pr (@$projlist) {
@@ -5394,12 +5398,13 @@ sub git_project_list_body {
 	# filtering out forks before filling info allows to do less work
 	@projects = filter_forks_from_projects_list(\@projects)
 		if ($check_forks);
-	@projects = fill_project_list_info(\@projects);
-	# searching projects require filling to be run before it
+	# search_projects_list pre-fills required info
 	@projects = search_projects_list(\@projects,
 	                                 'searchtext' => $searchtext,
 	                                 'tagfilter'  => $tagfilter)
 		if ($tagfilter || $searchtext);
+	# fill the rest
+	@projects = fill_project_list_info(\@projects);
 
 	$order ||= $default_projects_order;
 	$from = 0 unless defined $from;
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-02-23 15:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-23 15:54 [PATCHv3 0/3] gitweb: Faster project search Jakub Narebski
2012-02-23 15:54 ` [PATCHv3 1/3] gitweb: Refactor checking if part of project info need filling Jakub Narebski
2012-02-23 15:54 ` [PATCHv3 2/3] gitweb: Option for filling only specified info in fill_project_list_info Jakub Narebski
2012-02-23 15:54 ` [PATCHv3 3/3] gitweb: Faster project search Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).