git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] gitweb: Improve ctags, introduce categories
@ 2011-04-29 17:51 Jakub Narebski
  2011-04-29 17:51 ` [PATCH 1/6 (v2)] gitweb: Restructure projects list generation Jakub Narebski
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 17:51 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey,
	Jakub Narebski

This series is composed of two independent parts: improvements to
content tags (ctags) handling in patches 2-3, and introducing optional
grouping of projects by categories in patches 4-6 (where 4-5 are
preparatory).

Both of those series are based on 1st patch in series which
restructures how projects list is being generated, to make those
changes easy to introduce.  Because both of those depend on the same
introductory patch they are put in single series... that and the fact
that there was trivial conflict to be resolved in rebasing 'project
category' part on top of 'ctags' part.

Table of contents:
~~~~~~~~~~~~~~~~~~
* [PATCH 1/6 (v2)] gitweb: Restructure projects list generation

  This patch extracts filtering out "forks" and searching for projects
  (i.e. limiting listed projects) into separate subroutines.  Also
  filtering is done upfront and not while printing.  This makes for
  more clear code and makes subsequent changes much easier.

  Filtering forks is now faster thanks to using trie; see comments in
  code and in the patch itself.

* [PATCH 2/6] gitweb: Change the way "content tags" ('ctags') are handled

  Making ctags info parsing more robust sort of fixes bug noticed in

    gitweb: cloud tags feature produces malformed XML for errors
    http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/802865/focus=168266
    http://bugs.debian.org/616005

  caused by myshandling underdocumented then 'ctags' feature.  This
  patch DOES NOT however resolve deeper fundamental problem with error
  / exception handling in gitweb after some data was already sent to a
  browser.

  It also removes gitweb half of interface for adding content tags
  (labels) to a project; it was quarter of feature at best (there
  is/was no example implementation of server side half of adding
  content tags, no way to delete or fix content tags).

* [PATCH 3/6] gitweb: Mark matched 'ctag' / contents tag (?by_tag=foo)

  This is simple improvement, done while working on 'ctags' feature.

* [PATCH 4/6] gitweb: Split git_project_list_body in two functions
  [PATCH 5/6] gitweb: Modularized git_get_project_description to be more generic

  Those two patches are refactoring preparing way for project
  categories (for the next patch).

* [PATCH 6/6] gitweb: Optional grouping of projects by category

  Port of Sebastien Cevey port of Sham Chukoury's patch for the XMMS2
  gitweb to modern gitweb.

Shortlog:
~~~~~~~~~
Jakub Narebski (3):
  gitweb: Restructure projects list generation
  gitweb: Change the way "content tags" ('ctags') are handled
  gitweb: Mark matched 'ctag' / contents tag (?by_tag=foo)

Sebastien Cevey (3):
  gitweb: Split git_project_list_body in two functions
  gitweb: Modularized git_get_project_description to be more generic
  gitweb: Optional grouping of projects by category

Diffstat:
~~~~~~~~~
 gitweb/README                             |   16 +
 gitweb/gitweb.perl                        |  546 ++++++++++++++++++++---------
 gitweb/static/gitweb.css                  |    7 +
 t/t9500-gitweb-standalone-no-errors.sh    |   57 +++
 t/t9502-gitweb-standalone-parse-output.sh |   74 ++++
 5 files changed, 542 insertions(+), 158 deletions(-)

-- 
1.7.3

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/6 (v2)] gitweb: Restructure projects list generation
  2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
@ 2011-04-29 17:51 ` Jakub Narebski
  2011-05-07 18:39   ` Jakub Narebski
  2011-04-29 17:51 ` [PATCH 2/6] gitweb: Change the way "content tags" ('ctags') are handled Jakub Narebski
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 17:51 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey,
	Jakub Narebski

Extract filtering out forks (which is done if 'forks' feature is
enabled) into filter_forks_from_projects_list subroutine, and
searching projects (via projects search form, or via content tags)
into search_projects_list subroutine.

Both are now run _before_ displaying projects, and not while printing;
this allow to know upfront if there were any found projects.  Gitweb
now can and do print 'No such projects found' if user searches for
phrase which does not correspond to any project (any repository).
This also would allow splitting projects list into pages, if we so
desire.

Filtering out forks and marking repository (project) as having forks
is now consolidated into one subroutine (special case of handling
forks in git_get_projects_list only for $projects_list being file is
now removed).  Forks handling is also cleaned up and simplified.
$pr->{'forks'} now contains un-filled list of forks; we can now also
detect situation where the way for having forks is prepared, but there
are no forks yet.

Sorting projects got also refactored in a very straight way (just
moving code) into sort_projects_list subroutine.

The interaction between forks, content tags and searching is now made
more explicit: searching whether by tag, or via search form turns off
fork filtering (gitweb searches also forks, and will show all
results).  If 'ctags' feature is disabled, then searching by tag is
too.

The t9500 test now includes some basic test for 'forks' and 'ctags'
features; the t9502 includes test checking if gitweb correctly filters
out forks.


Generating list of projects by scanning given directory is now also a
bit simplified wrt. handling filtering; it is byproduct of extracting
filtering forks to separate subroutine.

While at it we now detect that there are no projects and respond with
"404 No projects found" also for 'project_index' and 'opml' actions.

Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch was sent to git mailing list as
  
  [PATCHv2/RFC] gitweb: Restructure projects list generation
  http://thread.gmane.org/gmane.comp.version-control.git/167996/focus=168321

The main difference from previous version

  [PATCH/RFC] gitweb: Restructure projects list generation
  http://thread.gmane.org/gmane.comp.version-control.git/167996

is that newly introduced filter_forks_from_projects_list() no longer
relies on the assumption that forked project immediately precedes all
of its forks... which is not true when $porjects_list is a directory
and we have e.g. such situation:

  repo.git
  repo.related.git
  repo_other.git
  repo/fork.git

This issue was noticed by Jonathan Nieder.

filter_forks_from_projects_list() uses trie[1] (prefix tree) to find
which repositories are forks of other projects; directories are used
as "letters" in trie.  Algorithm to create trie and find prefix match
was created with some help of code of Tree::Tree[2] Perl module from
CPAN.

[1]: http://en.wikipedia.org/wiki/Trie
[2]: http://p3rl.org/Tree::Trie

This revision includes additionally tests in t9502 that correct
projects are filtered.


Note that before this commit filtering out forks worked only for
$projects_list being a file, it assumed that forks are always after
forked project (not very unreasonable), and used algorithm which is
O(<projects>^2).

Current code filters out projects only where it should, regardless
whether $projects_list is directory or a file, and trie-based
algorithm is O(<projects>) + O(<projects forked>*<max depth>).
Current code also consistently finds shortest prefix (important in
counting forks in case of forks of forks).


There remains a question whether $pr->{'forks'} should be a list of
forks, or would having $pr->{'forks'} being number of forks be enough.

Refactoring sort_projects_list and fixing handling of $project_maxdepth
could be done in a separate patches.

 gitweb/gitweb.perl                        |  250 ++++++++++++++++++++---------
 t/t9500-gitweb-standalone-no-errors.sh    |   49 ++++++
 t/t9502-gitweb-standalone-parse-output.sh |   74 +++++++++
 3 files changed, 296 insertions(+), 77 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index ee69ea6..014b33b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2651,21 +2651,23 @@ sub git_get_project_url_list {
 }
 
 sub git_get_projects_list {
-	my ($filter) = @_;
+	my $filter = shift || '';
 	my @list;
 
-	$filter ||= '';
 	$filter =~ s/\.git$//;
 
-	my $check_forks = gitweb_check_feature('forks');
-
 	if (-d $projects_list) {
 		# search in directory
-		my $dir = $projects_list . ($filter ? "/$filter" : '');
+		my $dir = $projects_list;
 		# remove the trailing "/"
 		$dir =~ s!/+$!!;
-		my $pfxlen = length("$dir");
-		my $pfxdepth = ($dir =~ tr!/!!);
+		my $pfxlen = length("$projects_list");
+		my $pfxdepth = ($projects_list =~ tr!/!!);
+		# when filtering, search only given subdirectory
+		if ($filter) {
+			$dir .= "/$filter";
+			$dir =~ s!/+$!!;
+		}
 
 		File::Find::find({
 			follow_fast => 1, # follow symbolic links
@@ -2680,14 +2682,14 @@ sub git_get_projects_list {
 				# only directories can be git repositories
 				return unless (-d $_);
 				# don't traverse too deep (Find is super slow on os x)
+				# $project_maxdepth excludes depth of $projectroot
 				if (($File::Find::name =~ tr!/!!) - $pfxdepth > $project_maxdepth) {
 					$File::Find::prune = 1;
 					return;
 				}
 
-				my $subdir = substr($File::Find::name, $pfxlen + 1);
+				my $path = substr($File::Find::name, $pfxlen + 1);
 				# we check related file in $projectroot
-				my $path = ($filter ? "$filter/" : '') . $subdir;
 				if (check_export_ok("$projectroot/$path")) {
 					push @list, { path => $path };
 					$File::Find::prune = 1;
@@ -2700,7 +2702,6 @@ sub git_get_projects_list {
 		# 'git%2Fgit.git Linus+Torvalds'
 		# 'libs%2Fklibc%2Fklibc.git H.+Peter+Anvin'
 		# 'linux%2Fhotplug%2Fudev.git Greg+Kroah-Hartman'
-		my %paths;
 		open my $fd, '<', $projects_list or return;
 	PROJECT:
 		while (my $line = <$fd>) {
@@ -2711,32 +2712,9 @@ sub git_get_projects_list {
 			if (!defined $path) {
 				next;
 			}
-			if ($filter ne '') {
-				# looking for forks;
-				my $pfx = substr($path, 0, length($filter));
-				if ($pfx ne $filter) {
-					next PROJECT;
-				}
-				my $sfx = substr($path, length($filter));
-				if ($sfx !~ /^\/.*\.git$/) {
-					next PROJECT;
-				}
-			} elsif ($check_forks) {
-			PATH:
-				foreach my $filter (keys %paths) {
-					# looking for forks;
-					my $pfx = substr($path, 0, length($filter));
-					if ($pfx ne $filter) {
-						next PATH;
-					}
-					my $sfx = substr($path, length($filter));
-					if ($sfx !~ /^\/.*\.git$/) {
-						next PATH;
-					}
-					# is a fork, don't include it in
-					# the list
-					next PROJECT;
-				}
+			# if $filter is rpovided, check if $path begins with $filter
+			if ($filter && $path !~ m!^\Q$filter\E/!) {
+				next;
 			}
 			if (check_export_ok("$projectroot/$path")) {
 				my $pr = {
@@ -2744,8 +2722,6 @@ sub git_get_projects_list {
 					owner => to_utf8($owner),
 				};
 				push @list, $pr;
-				(my $forks_path = $path) =~ s/\.git$//;
-				$paths{$forks_path}++;
 			}
 		}
 		close $fd;
@@ -2753,6 +2729,98 @@ sub git_get_projects_list {
 	return @list;
 }
 
+# written with help of Tree::Trie module (Perl Artistic License, GPL compatibile)
+# as side effects it sets 'forks' field to list of forks for forked projects
+sub filter_forks_from_projects_list {
+	my $projects = shift;
+
+	my %trie; # prefix tree of directories (path components)
+	# generate trie out of those directories that might contain forks
+	foreach my $pr (@$projects) {
+		my $path = $pr->{'path'};
+		$path =~ s/\.git$//;      # forks of 'repo.git' are in 'repo/' directory
+		next if ($path =~ m!/$!); # skip non-bare repositories, e.g. 'repo/.git'
+		next unless ($path);      # skip '.git' repository: tests, git-instaweb
+		next unless (-d $path);   # containing directory exists
+		$pr->{'forks'} = [];      # there can be 0 or more forks of project
+
+		# add to trie
+		my @dirs = split('/', $path);
+		# walk the trie, until either runs out of components or out of trie
+		my $ref = \%trie;
+		while (scalar @dirs &&
+		       exists($ref->{$dirs[0]})) {
+			$ref = $ref->{shift @dirs};
+		}
+		# create rest of trie structure from rest of components
+		foreach my $dir (@dirs) {
+			$ref = $ref->{$dir} = {};
+		}
+		# create end marker, store $pr as a data
+		$ref->{''} = $pr if (!exists $ref->{''});
+	}
+
+	# filter out forks, by finding shortest prefix match for paths
+	my @filtered;
+ PROJECT:
+	foreach my $pr (@$projects) {
+		# trie lookup
+		my $ref = \%trie;
+	DIR:
+		foreach my $dir (split('/', $pr->{'path'})) {
+			if (exists $ref->{''}) {
+				# found [shortest] prefix, is a fork - skip it
+				push @{$ref->{''}{'forks'}}, $pr;
+				next PROJECT;
+			}
+			if (!exists $ref->{$dir}) {
+				# not in trie, cannot have prefix, not a fork
+				push @filtered, $pr;
+				next PROJECT;
+			}
+			# If the dir is there, we just walk one step down the trie.
+			$ref = $ref->{$dir};
+		}
+		# we ran out of trie
+		# (shouldn't happen: it's either no match, or end marker)
+		push @filtered, $pr;
+	}
+
+	return @filtered;
+}
+
+# note: fill_project_list_info must be run first,
+# for 'descr_long' and 'ctags' to be filled
+sub search_projects_list {
+	my ($projlist, %opts) = @_;
+	my $tagfilter  = $opts{'tagfilter'};
+	my $searchtext = $opts{'searchtext'};
+
+	return @$projlist
+		unless ($tagfilter || $searchtext);
+
+	my @projects;
+ PROJECT:
+	foreach my $pr (@$projlist) {
+
+		if ($tagfilter) {
+			next unless ref($pr->{'ctags'}) eq 'HASH';
+			next unless
+				grep { lc($_) eq lc($tagfilter) } keys %{$pr->{'ctags'}};
+		}
+
+		if ($searchtext) {
+			next unless
+				$pr->{'path'} =~ /$searchtext/ ||
+				$pr->{'descr_long'} =~ /$searchtext/;
+		}
+
+		push @projects, $pr;
+	}
+
+	return @projects;
+}
+
 our $gitweb_project_owner = undef;
 sub git_get_project_list_from_file {
 
@@ -4742,7 +4810,7 @@ sub git_patchset_body {
 # project in the list, removing invalid projects from returned list
 # NOTE: modifies $projlist, but does not remove entries from it
 sub fill_project_list_info {
-	my ($projlist, $check_forks) = @_;
+	my $projlist = shift;
 	my @projects;
 
 	my $show_ctags = gitweb_check_feature('ctags');
@@ -4762,23 +4830,36 @@ sub fill_project_list_info {
 		if (!defined $pr->{'owner'}) {
 			$pr->{'owner'} = git_get_project_owner("$pr->{'path'}") || "";
 		}
-		if ($check_forks) {
-			my $pname = $pr->{'path'};
-			if (($pname =~ s/\.git$//) &&
-			    ($pname !~ /\/$/) &&
-			    (-d "$projectroot/$pname")) {
-				$pr->{'forks'} = "-d $projectroot/$pname";
-			} else {
-				$pr->{'forks'} = 0;
-			}
+		if ($show_ctags) {
+			$pr->{'ctags'} = git_get_project_ctags($pr->{'path'});
 		}
-		$show_ctags and $pr->{'ctags'} = git_get_project_ctags($pr->{'path'});
 		push @projects, $pr;
 	}
 
 	return @projects;
 }
 
+sub sort_projects_list {
+	my ($projlist, $order) = @_;
+	my @projects;
+
+	my %order_info = (
+		project => { key => 'path', type => 'str' },
+		descr => { key => 'descr_long', type => 'str' },
+		owner => { key => 'owner', type => 'str' },
+		age => { key => 'age', type => 'num' }
+	);
+	my $oi = $order_info{$order};
+	return @$projlist unless defined $oi;
+	if ($oi->{'type'} eq 'str') {
+		@projects = sort {$a->{$oi->{'key'}} cmp $b->{$oi->{'key'}}} @$projlist;
+	} else {
+		@projects = sort {$a->{$oi->{'key'}} <=> $b->{$oi->{'key'}}} @$projlist;
+	}
+
+	return @projects;
+}
+
 # print 'sort by' <th> element, generating 'sort by $name' replay link
 # if that order is not selected
 sub print_sort_th {
@@ -4805,28 +4886,39 @@ sub format_sort_th {
 sub git_project_list_body {
 	# actually uses global variable $project
 	my ($projlist, $order, $from, $to, $extra, $no_header) = @_;
+	my @projects = @$projlist;
 
 	my $check_forks = gitweb_check_feature('forks');
-	my @projects = fill_project_list_info($projlist, $check_forks);
+	my $show_ctags  = gitweb_check_feature('ctags');
+	my $tagfilter = $show_ctags ? $cgi->param('by_tag') : undef;
+	$check_forks = undef
+		if ($tagfilter || $searchtext);
+
+	# filtering out forks before filling info allows to do less work
+	@projects = filter_forks_from_projects_list(\@projects)
+		if ($check_forks);
+	@projects = fill_project_list_info(\@projects);
+	# searching projects require filling to be run before it
+	@projects = search_projects_list(\@projects,
+	                                 'searchtext' => $searchtext,
+	                                 'tagfilter'  => $tagfilter)
+		if ($tagfilter || $searchtext);
 
 	$order ||= $default_projects_order;
 	$from = 0 unless defined $from;
 	$to = $#projects if (!defined $to || $#projects < $to);
 
-	my %order_info = (
-		project => { key => 'path', type => 'str' },
-		descr => { key => 'descr_long', type => 'str' },
-		owner => { key => 'owner', type => 'str' },
-		age => { key => 'age', type => 'num' }
-	);
-	my $oi = $order_info{$order};
-	if ($oi->{'type'} eq 'str') {
-		@projects = sort {$a->{$oi->{'key'}} cmp $b->{$oi->{'key'}}} @projects;
-	} else {
-		@projects = sort {$a->{$oi->{'key'}} <=> $b->{$oi->{'key'}}} @projects;
+	# short circuit
+	if ($from > $to) {
+		print "<center>\n".
+		      "<b>No such projects found</b><br />\n".
+		      "Click ".$cgi->a({-href=>href(project=>undef)},"here")." to view all projects<br />\n".
+		      "</center>\n<br />\n";
+		return;
 	}
 
-	my $show_ctags = gitweb_check_feature('ctags');
+	@projects = sort_projects_list(\@projects, $order);
+
 	if ($show_ctags) {
 		my %ctags;
 		foreach my $p (@projects) {
@@ -4852,32 +4944,26 @@ sub git_project_list_body {
 		      "</tr>\n";
 	}
 	my $alternate = 1;
-	my $tagfilter = $cgi->param('by_tag');
 	for (my $i = $from; $i <= $to; $i++) {
 		my $pr = $projects[$i];
 
-		next if $tagfilter and $show_ctags and not grep { lc $_ eq lc $tagfilter } keys %{$pr->{'ctags'}};
-		next if $searchtext and not $pr->{'path'} =~ /$searchtext/
-			and not $pr->{'descr_long'} =~ /$searchtext/;
-		# Weed out forks or non-matching entries of search
-		if ($check_forks) {
-			my $forkbase = $project; $forkbase ||= ''; $forkbase =~ s#\.git$#/#;
-			$forkbase="^$forkbase" if $forkbase;
-			next if not $searchtext and not $tagfilter and $show_ctags
-				and $pr->{'path'} =~ m#$forkbase.*/.*#; # regexp-safe
-		}
-
 		if ($alternate) {
 			print "<tr class=\"dark\">\n";
 		} else {
 			print "<tr class=\"light\">\n";
 		}
 		$alternate ^= 1;
+
 		if ($check_forks) {
 			print "<td>";
 			if ($pr->{'forks'}) {
-				print "<!-- $pr->{'forks'} -->\n";
-				print $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "+");
+				my $nforks = scalar @{$pr->{'forks'}};
+				if ($nforks > 0) {
+					print $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks"),
+					               -title => "$nforks forks"}, "+");
+				} else {
+					print $cgi->span({-title => "$nforks forks"}, "+");
+				}
 			}
 			print "</td>\n";
 		}
@@ -5357,7 +5443,10 @@ sub git_forks {
 }
 
 sub git_project_index {
-	my @projects = git_get_projects_list($project);
+	my @projects = git_get_projects_list();
+	if (!@projects) {
+		die_error(404, "No projects found");
+	}
 
 	print $cgi->header(
 		-type => 'text/plain',
@@ -5399,7 +5488,11 @@ sub git_summary {
 	my $check_forks = gitweb_check_feature('forks');
 
 	if ($check_forks) {
+		# find forks of a project
 		@forklist = git_get_projects_list($project);
+		# filter out forks of forks
+		@forklist = filter_forks_from_projects_list(\@forklist)
+			if (@forklist);
 	}
 
 	git_header_html();
@@ -7319,6 +7412,9 @@ sub git_atom {
 
 sub git_opml {
 	my @list = git_get_projects_list();
+	if (!@list) {
+		die_error(404, "No projects found");
+	}
 
 	print $cgi->header(
 		-type => 'text/xml',
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index afac5b5..71ef0ac 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -595,4 +595,53 @@ test_expect_success HIGHLIGHT \
 	 git commit -m "Add test.sh" &&
 	 gitweb_run "p=.git;a=blob;f=test.sh"'
 
+# ----------------------------------------------------------------------
+# forks of projects
+
+cat >>gitweb_config.perl <<\EOF &&
+$feature{'forks'}{'default'} = [1];
+EOF
+
+test_expect_success \
+	'forks: prepare' \
+	'git init --bare foo.git &&
+	 git --git-dir=foo.git --work-tree=. add file &&
+	 git --git-dir=foo.git --work-tree=. commit -m "Initial commit" &&
+	 echo "foo" > foo.git/description &&
+	 mkdir -p foo &&
+	 (cd foo &&
+	  git clone --shared --bare ../foo.git foo-forked.git &&
+	  echo "fork of foo" > foo-forked.git/description)'
+
+test_expect_success \
+	'forks: projects list' \
+	'gitweb_run'
+
+test_expect_success \
+	'forks: forks action' \
+	'gitweb_run "p=foo.git;a=forks"'
+
+# ----------------------------------------------------------------------
+# content tags (tag cloud)
+
+cat >>gitweb_config.perl <<-\EOF &&
+# we don't test _setting_ content tags, so any true value is good
+$feature{'ctags'}{'default'} = ['ctags_script.cgi'];
+EOF
+
+test_expect_success \
+	'ctags: tag cloud in projects list' \
+	'mkdir .git/ctags &&
+	 echo "2" > .git/ctags/foo &&
+	 echo "1" > .git/ctags/bar &&
+	gitweb_run'
+
+test_expect_success \
+	'ctags: search projects by existing tag' \
+	'gitweb_run "by_tag=foo"'
+
+test_expect_success \
+	'ctags: search projects by non existent tag' \
+	'gitweb_run "by_tag=non-existent"'
+
 test_done
diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh
index dd83890..731e64c 100755
--- a/t/t9502-gitweb-standalone-parse-output.sh
+++ b/t/t9502-gitweb-standalone-parse-output.sh
@@ -112,4 +112,78 @@ test_expect_success 'snapshot: hierarchical branch name (xx/test)' '
 '
 test_debug 'cat gitweb.headers'
 
+# ----------------------------------------------------------------------
+# forks of projects
+
+test_expect_success 'forks: setup' '
+	git init --bare foo.git &&
+	echo file > file &&
+	git --git-dir=foo.git --work-tree=. add file &&
+	git --git-dir=foo.git --work-tree=. commit -m "Initial commit" &&
+	echo "foo" > foo.git/description &&
+	git clone --bare foo.git foo.bar.git &&
+	echo "foo.bar" > foo.bar.git/description &&
+	git clone --bare foo.git foo_baz.git &&
+	echo "foo_baz" > foo_baz.git/description &&
+	rm -fr   foo &&
+	mkdir -p foo &&
+	(
+		cd foo &&
+		git clone --shared --bare ../foo.git foo-forked.git &&
+		echo "fork of foo" > foo-forked.git/description
+	)
+'
+
+test_expect_success 'forks: not skipped unless "forks" feature enabled' '
+	gitweb_run "a=project_list" &&
+	grep -q ">\\.git<"               gitweb.body &&
+	grep -q ">foo\\.git<"            gitweb.body &&
+	grep -q ">foo_baz\\.git<"        gitweb.body &&
+	grep -q ">foo\\.bar\\.git<"      gitweb.body &&
+	grep -q ">foo_baz\\.git<"        gitweb.body &&
+	grep -q ">foo/foo-forked\\.git<" gitweb.body &&
+	grep -q ">fork of .*<"           gitweb.body
+'
+
+cat >>gitweb_config.perl <<\EOF &&
+$feature{'forks'}{'default'} = [1];
+EOF
+
+test_expect_success 'forks: forks skipped if "forks" feature enabled' '
+	gitweb_run "a=project_list" &&
+	grep -q ">\\.git<"               gitweb.body &&
+	grep -q ">foo\\.git<"            gitweb.body &&
+	grep -q ">foo_baz\\.git<"        gitweb.body &&
+	grep -q ">foo\\.bar\\.git<"      gitweb.body &&
+	grep -q ">foo_baz\\.git<"        gitweb.body &&
+	grep -v ">foo/foo-forked\\.git<" gitweb.body &&
+	grep -v ">fork of .*<"           gitweb.body
+'
+
+test_expect_success 'forks: "forks" action for forked repository' '
+	gitweb_run "p=foo.git;a=forks" &&
+	grep -q ">foo/foo-forked\\.git<" gitweb.body &&
+	grep -q ">fork of foo<"          gitweb.body
+'
+
+test_expect_success 'forks: can access forked repository' '
+	gitweb_run "p=foo/foo-forked.git;a=summary" &&
+	grep -q "200 OK"        gitweb.headers &&
+	grep -q ">fork of foo<" gitweb.body
+'
+
+test_expect_success 'forks: project_index lists all projects (incl. forks)' '
+	cat >expected <<-\EOF
+	.git
+	foo.bar.git
+	foo.git
+	foo/foo-forked.git
+	foo_baz.git
+	EOF
+	gitweb_run "a=project_index" &&
+	sed -e "s/ .*//" <gitweb.body | sort >actual &&
+	test_cmp expected actual
+'
+
+
 test_done
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/6] gitweb: Change the way "content tags" ('ctags') are handled
  2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
  2011-04-29 17:51 ` [PATCH 1/6 (v2)] gitweb: Restructure projects list generation Jakub Narebski
@ 2011-04-29 17:51 ` Jakub Narebski
  2011-04-29 17:51 ` [PATCH 3/6] gitweb: Mark matched 'ctag' / contents tag (?by_tag=foo) Jakub Narebski
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 17:51 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey,
	Jakub Narebski

The major change is removing the ability to edit content tags (ctags)
in a web browser.

The interface was created by gitweb, while actual editing of tags was
to be done by external script; the API was not defined, and neither
was provided example implementation.  Such split is also a bit fragile
- interface and implementation have to be kept in sync.  Gitweb
provided only ability to add tags; you could not edit tags nor delete
them.


Format of ctags is now described in the comment above git_get_project_ctags
subroutine.  Gitweb now is more robust with respect to original ctags
format; it also accepts two new formats: $GIT_DIR/ctags file, with one
content tag per line, and multi-value `gitweb.ctag' config variable.

Gathering all ctags of all project is now put in git_gather_all_ctags
subroutine, making git_project_list_body more clear.

git_populate_project_tagcloud subroutine now generates data used for
tag cloud, including generation of ctag link, also in the case
HTML::TagCloud module is unavailable.  Links are now generated using
href() subroutine - this is more robust, as ctags might contain '?',
';' and '=' special characters that need to be escaped in query param.
Shown tags are HTML-escaped.

The generation of tag cloud in git_show_project_tagcloud in the case
when HTML::TagCloud is not available is now changed slightly.

The 'content tags' field on project summary page is made more in line
with other fields in "projects_list" table.  Because one cannot now
add new tags from web interface, this field is no longer displayed
when there are no content tags for given project.

Ctags-issue-Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Ctags-issue-Reported-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch was originally sent to git mailing list in

  [RFC/PATCH] gitweb: Change the way "content tags" ('ctags') are handled
  http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/802865/focus=168599

as part of

  gitweb: cloud tags feature produces malformed XML for errors
  http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/802865/focus=168266

thread, crossposted to git mailing list.


Hardening parsing of ctags files, so that gitweb does not crash on
malformed entries, but just ignores them, sort of fixes original issue
reported in

  Bug#616005: libhtml-tagcloud-perl breaks gitweb
  http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/802865/
  http://bugs.debian.org/616005

by Uwe Kleine-König and crossposted to git mailing list by Jonathan
Nieder.

I say "sort of fixes" because this patch doesn't attempt to solve
deeper fundamental problem with error / exception handling in gitweb
after some data was already sent to a browser.

Note also that the bug was caused by mishandling (incorrectly
generating) underdocumented then 'ctags' feature.

 gitweb/gitweb.perl |  141 +++++++++++++++++++++++++++++++++++----------------
 1 files changed, 97 insertions(+), 44 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 014b33b..60cb772 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -412,20 +412,23 @@ our %feature = (
 		'override' => 0,
 		'default' => []},
 
-	# Allow gitweb scan project content tags described in ctags/
-	# of project repository, and display the popular Web 2.0-ish
-	# "tag cloud" near the project list. Note that this is something
-	# COMPLETELY different from the normal Git tags.
+	# Allow gitweb scan project content tags of project repository,
+	# and display the popular Web 2.0-ish "tag cloud" near the projects
+	# list.  Note that this is something COMPLETELY different from the
+	# normal Git tags.
 
 	# gitweb by itself can show existing tags, but it does not handle
-	# tagging itself; you need an external application for that.
-	# For an example script, check Girocco's cgi/tagproj.cgi.
+	# tagging itself; you need to do it externally, outside gitweb.
+	# The format is described in git_get_project_ctags() subroutine.
 	# You may want to install the HTML::TagCloud Perl module to get
 	# a pretty tag cloud instead of just a list of tags.
 
 	# To enable system wide have in $GITWEB_CONFIG
-	# $feature{'ctags'}{'default'} = ['path_to_tag_script'];
+	# $feature{'ctags'}{'default'} = [1];
 	# Project specific override is not supported.
+
+	# In the future whether ctags editing is enabled might depend
+	# on the value, but using 1 should always mean no editing of ctags.
 	'ctags' => {
 		'override' => 0,
 		'default' => [0]},
@@ -703,6 +706,7 @@ our @cgi_param_mapping = (
 	snapshot_format => "sf",
 	extra_options => "opt",
 	search_use_regexp => "sr",
+	ctag => "by_tag",
 	# this must be last entry (for manipulation from JavaScript)
 	javascript => "js"
 );
@@ -2572,23 +2576,66 @@ sub git_get_project_description {
 	return $descr;
 }
 
+# supported formats:
+# * $GIT_DIR/ctags/<tagname> file (in 'ctags' subdirectory)
+#   - if its contents is a number, use it as tag weight,
+#   - otherwise add a tag with weight 1
+# * $GIT_DIR/ctags file, each line is a tag (with weight 1)
+#   the same value multiple times increases tag weight
+# * `gitweb.ctag' multi-valued repo config variable
 sub git_get_project_ctags {
-	my $path = shift;
+	my $project = shift;
 	my $ctags = {};
 
-	$git_dir = "$projectroot/$path";
-	opendir my $dh, "$git_dir/ctags"
-		or return $ctags;
-	foreach (grep { -f $_ } map { "$git_dir/ctags/$_" } readdir($dh)) {
-		open my $ct, '<', $_ or next;
-		my $val = <$ct>;
-		chomp $val;
-		close $ct;
-		my $ctag = $_; $ctag =~ s#.*/##;
-		$ctags->{$ctag} = $val;
+	$git_dir = "$projectroot/$project";
+	if (opendir my $dh, "$git_dir/ctags") {
+		my @files = grep { -f $_ } map { "$git_dir/ctags/$_" } readdir($dh);
+		foreach my $tagfile (@files) {
+			open my $ct, '<', $tagfile
+				or next;
+			my $val = <$ct>;
+			chomp $val if $val;
+			close $ct;
+
+			(my $ctag = $tagfile) =~ s#.*/##;
+			if ($val =~ /\d+/) {
+				$ctags->{$ctag} = $val;
+			} else {
+				$ctags->{$ctag} = 1;
+			}
+		}
+		closedir $dh;
+
+	} elsif (open my $fh, '<', "$git_dir/ctags") {
+		while (my $line = <$fh>) {
+			chomp $line;
+			$ctags->{$line}++ if $line;
+		}
+		close $fh;
+
+	} else {
+		my $taglist = config_to_multi(git_get_project_config('ctag'));
+		foreach my $tag (@$taglist) {
+			$ctags->{$tag}++;
+		}
 	}
-	closedir $dh;
-	$ctags;
+
+	return $ctags;
+}
+
+# return hash, where keys are content tags ('ctags'),
+# and values are sum of weights of given tag in every project
+sub git_gather_all_ctags {
+	my $projects = shift;
+	my $ctags = {};
+
+	foreach my $p (@$projects) {
+		foreach my $ct (keys %{$p->{'ctags'}}) {
+			$ctags->{$ct} += $p->{'ctags'}->{$ct};
+		}
+	}
+
+	return $ctags;
 }
 
 sub git_populate_project_tagcloud {
@@ -2608,31 +2655,41 @@ sub git_populate_project_tagcloud {
 	my $cloud;
 	if (eval { require HTML::TagCloud; 1; }) {
 		$cloud = HTML::TagCloud->new;
-		foreach (sort keys %ctags_lc) {
+		foreach my $ctag (sort keys %ctags_lc) {
 			# Pad the title with spaces so that the cloud looks
 			# less crammed.
-			my $title = $ctags_lc{$_}->{topname};
+			my $title = esc_html($ctags_lc{$ctag}->{topname});
 			$title =~ s/ /&nbsp;/g;
 			$title =~ s/^/&nbsp;/g;
 			$title =~ s/$/&nbsp;/g;
-			$cloud->add($title, $home_link."?by_tag=".$_, $ctags_lc{$_}->{count});
+			$cloud->add($title, href(project=>undef, ctag=>$ctag),
+			            $ctags_lc{$ctag}->{count});
 		}
 	} else {
-		$cloud = \%ctags_lc;
+		$cloud = {};
+		foreach my $ctag (keys %ctags_lc) {
+			my $title = $ctags_lc{$ctag}->{topname};
+			$cloud->{$ctag}{count} = $ctags_lc{$ctag}->{count};
+			$cloud->{$ctag}{ctag} =
+				$cgi->a({-href=>href(project=>undef, ctag=>$ctag)},
+			          esc_html($title, -nbsp=>1));
+		}
 	}
-	$cloud;
+	return $cloud;
 }
 
 sub git_show_project_tagcloud {
 	my ($cloud, $count) = @_;
-	print STDERR ref($cloud)."..\n";
 	if (ref $cloud eq 'HTML::TagCloud') {
 		return $cloud->html_and_css($count);
 	} else {
-		my @tags = sort { $cloud->{$a}->{count} <=> $cloud->{$b}->{count} } keys %$cloud;
-		return '<p align="center">' . join (', ', map {
-			$cgi->a({-href=>"$home_link?by_tag=$_"}, $cloud->{$_}->{topname})
-		} splice(@tags, 0, $count)) . '</p>';
+		my @tags = sort { $cloud->{$a}->{'count'} <=> $cloud->{$b}->{'count'} } keys %$cloud;
+		return
+			'<div id="htmltagcloud"'.($project ? '' : ' align="center"').'>' .
+			join (', ', map {
+				$cloud->{$_}->{'ctag'}
+			} splice(@tags, 0, $count)) .
+			'</div>';
 	}
 }
 
@@ -4920,13 +4977,8 @@ sub git_project_list_body {
 	@projects = sort_projects_list(\@projects, $order);
 
 	if ($show_ctags) {
-		my %ctags;
-		foreach my $p (@projects) {
-			foreach my $ct (keys %{$p->{'ctags'}}) {
-				$ctags{$ct} += $p->{'ctags'}->{$ct};
-			}
-		}
-		my $cloud = git_populate_project_tagcloud(\%ctags);
+		my $ctags = git_gather_all_ctags(\@projects);
+		my $cloud = git_populate_project_tagcloud($ctags);
 		print git_show_project_tagcloud($cloud, 64);
 	}
 
@@ -5521,13 +5573,14 @@ sub git_summary {
 	my $show_ctags = gitweb_check_feature('ctags');
 	if ($show_ctags) {
 		my $ctags = git_get_project_ctags($project);
-		my $cloud = git_populate_project_tagcloud($ctags);
-		print "<tr id=\"metadata_ctags\"><td>Content tags:<br />";
-		print "</td>\n<td>" unless %$ctags;
-		print "<form action=\"$show_ctags\" method=\"post\"><input type=\"hidden\" name=\"p\" value=\"$project\" />Add: <input type=\"text\" name=\"t\" size=\"8\" /></form>";
-		print "</td>\n<td>" if %$ctags;
-		print git_show_project_tagcloud($cloud, 48);
-		print "</td></tr>";
+		if (%$ctags) {
+			# without ability to add tags, don't show if there are none
+			my $cloud = git_populate_project_tagcloud($ctags);
+			print "<tr id=\"metadata_ctags\">" .
+			      "<td>content tags</td>" .
+			      "<td>".git_show_project_tagcloud($cloud, 48)."</td>" .
+			      "</tr>\n";
+		}
 	}
 
 	print "</table>\n";
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/6] gitweb: Mark matched 'ctag' / contents tag (?by_tag=foo)
  2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
  2011-04-29 17:51 ` [PATCH 1/6 (v2)] gitweb: Restructure projects list generation Jakub Narebski
  2011-04-29 17:51 ` [PATCH 2/6] gitweb: Change the way "content tags" ('ctags') are handled Jakub Narebski
@ 2011-04-29 17:51 ` Jakub Narebski
  2011-04-29 17:51 ` [PATCH 4/6] gitweb: Split git_project_list_body in two functions Jakub Narebski
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 17:51 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey,
	Jakub Narebski

It might have been hard to discover that current view is limited to
projects with given content tag (ctag), as it was distinquished only
in gitweb URL.  Mark matched contents tag in the tag cloud using
"match" class, for easier discovery.

This commit introduces a bit of further code duplication in
git_populate_project_tagcloud().

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Acked-by: Petr Baudis <pasky@suse.cz>
---
This patch was originally sent to git mailing list as

  [PATCH 2/1] gitweb: Mark matched 'ctag' / contents tag (?by_tag=foo)
  http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/802865/focus=168731

(yes, it was crossposted).

 gitweb/gitweb.perl |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 60cb772..e81c27d 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2653,6 +2653,7 @@ sub git_populate_project_tagcloud {
 	}
 
 	my $cloud;
+	my $matched = $cgi->param('by_tag');
 	if (eval { require HTML::TagCloud; 1; }) {
 		$cloud = HTML::TagCloud->new;
 		foreach my $ctag (sort keys %ctags_lc) {
@@ -2662,17 +2663,22 @@ sub git_populate_project_tagcloud {
 			$title =~ s/ /&nbsp;/g;
 			$title =~ s/^/&nbsp;/g;
 			$title =~ s/$/&nbsp;/g;
+			if (defined $matched && $matched eq $ctag) {
+				$title = qq(<span class="match">$title</span>);
+			}
 			$cloud->add($title, href(project=>undef, ctag=>$ctag),
 			            $ctags_lc{$ctag}->{count});
 		}
 	} else {
 		$cloud = {};
 		foreach my $ctag (keys %ctags_lc) {
-			my $title = $ctags_lc{$ctag}->{topname};
+			my $title = esc_html($ctags_lc{$ctag}->{topname}, -nbsp=>1);
+			if (defined $matched && $matched eq $ctag) {
+				$title = qq(<span class="match">$title</span>);
+			}
 			$cloud->{$ctag}{count} = $ctags_lc{$ctag}->{count};
 			$cloud->{$ctag}{ctag} =
-				$cgi->a({-href=>href(project=>undef, ctag=>$ctag)},
-			          esc_html($title, -nbsp=>1));
+				$cgi->a({-href=>href(project=>undef, ctag=>$ctag)}, $title);
 		}
 	}
 	return $cloud;
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/6] gitweb: Split git_project_list_body in two functions
  2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
                   ` (2 preceding siblings ...)
  2011-04-29 17:51 ` [PATCH 3/6] gitweb: Mark matched 'ctag' / contents tag (?by_tag=foo) Jakub Narebski
@ 2011-04-29 17:51 ` Jakub Narebski
  2011-04-29 17:52 ` [PATCH 5/6] gitweb: Modularized git_get_project_description to be more generic Jakub Narebski
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 17:51 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey,
	Jakub Narebski

From: Sebastien Cevey <seb@cine7.net>

Extract the printing of project rows (body/contents of projects list
table) on the 'project_list' page into a separate git_project_list_rows
function. This makes it easier to reuse the code to print different
subsets of the whole project list.

[jn: Updated to post restructuring projects list generation]

Signed-off-by: Sebastien Cevey <seb@cine7.net>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch and subsequent patches were sent to git mailing list
(before rebasing on top of 'ctags' improvements) as

  [RFC/PATCH 0/3] gitweb: Categories support
  http://thread.gmane.org/gmane.comp.version-control.git/168616


This subseries is long in wait port of Sebastien Cevey series from
December 2008

  http://thread.gmane.org/gmane.comp.version-control.git/102844

to modern gitweb.  It was waiting for restructuring of projects list
generation that is part of this series.

Uwe, it might be an alternative to use of ctags (content tags, aka
project labels).

 gitweb/gitweb.perl |   89 +++++++++++++++++++++++++++++-----------------------
 1 files changed, 50 insertions(+), 39 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e81c27d..f8d5722 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -4946,6 +4946,55 @@ sub format_sort_th {
 	return $sort_th;
 }
 
+sub git_project_list_rows {
+	my ($projlist, $from, $to, $check_forks) = @_;
+
+	$from = 0 unless defined $from;
+	$to = $#$projlist if (!defined $to || $#$projlist < $to);
+
+	my $alternate = 1;
+	for (my $i = $from; $i <= $to; $i++) {
+		my $pr = $projlist->[$i];
+
+		if ($alternate) {
+			print "<tr class=\"dark\">\n";
+		} else {
+			print "<tr class=\"light\">\n";
+		}
+		$alternate ^= 1;
+
+		if ($check_forks) {
+			print "<td>";
+			if ($pr->{'forks'}) {
+				my $nforks = scalar @{$pr->{'forks'}};
+				if ($nforks > 0) {
+					print $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks"),
+					               -title => "$nforks forks"}, "+");
+				} else {
+					print $cgi->span({-title => "$nforks forks"}, "+");
+				}
+			}
+			print "</td>\n";
+		}
+		print "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
+		                        -class => "list"}, esc_html($pr->{'path'})) . "</td>\n" .
+		      "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
+		                        -class => "list", -title => $pr->{'descr_long'}},
+		                        esc_html($pr->{'descr'})) . "</td>\n" .
+		      "<td><i>" . chop_and_escape_str($pr->{'owner'}, 15) . "</i></td>\n";
+		print "<td class=\"". age_class($pr->{'age'}) . "\">" .
+		      (defined $pr->{'age_string'} ? $pr->{'age_string'} : "No commits") . "</td>\n" .
+		      "<td class=\"link\">" .
+		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary")}, "summary")   . " | " .
+		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"shortlog")}, "shortlog") . " | " .
+		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"log")}, "log") . " | " .
+		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"tree")}, "tree") .
+		      ($pr->{'forks'} ? " | " . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "forks") : '') .
+		      "</td>\n" .
+		      "</tr>\n";
+	}
+}
+
 sub git_project_list_body {
 	# actually uses global variable $project
 	my ($projlist, $order, $from, $to, $extra, $no_header) = @_;
@@ -5001,47 +5050,9 @@ sub git_project_list_body {
 		print "<th></th>\n" . # for links
 		      "</tr>\n";
 	}
-	my $alternate = 1;
-	for (my $i = $from; $i <= $to; $i++) {
-		my $pr = $projects[$i];
 
-		if ($alternate) {
-			print "<tr class=\"dark\">\n";
-		} else {
-			print "<tr class=\"light\">\n";
-		}
-		$alternate ^= 1;
+	git_project_list_rows(\@projects, $from, $to, $check_forks);
 
-		if ($check_forks) {
-			print "<td>";
-			if ($pr->{'forks'}) {
-				my $nforks = scalar @{$pr->{'forks'}};
-				if ($nforks > 0) {
-					print $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks"),
-					               -title => "$nforks forks"}, "+");
-				} else {
-					print $cgi->span({-title => "$nforks forks"}, "+");
-				}
-			}
-			print "</td>\n";
-		}
-		print "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
-		                        -class => "list"}, esc_html($pr->{'path'})) . "</td>\n" .
-		      "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
-		                        -class => "list", -title => $pr->{'descr_long'}},
-		                        esc_html($pr->{'descr'})) . "</td>\n" .
-		      "<td><i>" . chop_and_escape_str($pr->{'owner'}, 15) . "</i></td>\n";
-		print "<td class=\"". age_class($pr->{'age'}) . "\">" .
-		      (defined $pr->{'age_string'} ? $pr->{'age_string'} : "No commits") . "</td>\n" .
-		      "<td class=\"link\">" .
-		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary")}, "summary")   . " | " .
-		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"shortlog")}, "shortlog") . " | " .
-		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"log")}, "log") . " | " .
-		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"tree")}, "tree") .
-		      ($pr->{'forks'} ? " | " . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "forks") : '') .
-		      "</td>\n" .
-		      "</tr>\n";
-	}
 	if (defined $extra) {
 		print "<tr>\n";
 		if ($check_forks) {
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/6] gitweb: Modularized git_get_project_description to be more generic
  2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
                   ` (3 preceding siblings ...)
  2011-04-29 17:51 ` [PATCH 4/6] gitweb: Split git_project_list_body in two functions Jakub Narebski
@ 2011-04-29 17:52 ` Jakub Narebski
  2011-04-29 17:52 ` [PATCH 6/6] gitweb: Optional grouping of projects by category Jakub Narebski
  2011-04-29 21:31 ` [PATCH 0/6] gitweb: Improve ctags, introduce categories Junio C Hamano
  6 siblings, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 17:52 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey,
	Jakub Narebski

From: Sebastien Cevey <seb@cine7.net>

Introduce a git_get_file_or_project_config utility function to
retrieve a repository variable either from a plain text file in the
$GIT_DIR or else from 'gitweb.$variable' in the repository config
(e.g. 'description').

This would be used in next commit to retrieve category for a project,
which is to be stored in the same way as project description.

Signed-off-by: Sebastien Cevey <seb@cine7.net>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/gitweb.perl |   24 ++++++++++++++++--------
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index f8d5722..e8685ac 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2562,18 +2562,26 @@ sub git_get_path_by_hash {
 ## ......................................................................
 ## git utility functions, directly accessing git repository
 
-sub git_get_project_description {
-	my $path = shift;
+# get the value of config variable either from file named as the variable
+# itself in the repository ($GIT_DIR/$name file), or from gitweb.$name
+# configuration variable in the repository config file.
+sub git_get_file_or_project_config {
+	my ($path, $name) = @_;
 
 	$git_dir = "$projectroot/$path";
-	open my $fd, '<', "$git_dir/description"
-		or return git_get_project_config('description');
-	my $descr = <$fd>;
+	open my $fd, '<', "$git_dir/$name"
+		or return git_get_project_config($name);
+	my $conf = <$fd>;
 	close $fd;
-	if (defined $descr) {
-		chomp $descr;
+	if (defined $conf) {
+		chomp $conf;
 	}
-	return $descr;
+	return $conf;
+}
+
+sub git_get_project_description {
+	my $path = shift;
+	return git_get_file_or_project_config($path, 'description');
 }
 
 # supported formats:
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/6] gitweb: Optional grouping of projects by category
  2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
                   ` (4 preceding siblings ...)
  2011-04-29 17:52 ` [PATCH 5/6] gitweb: Modularized git_get_project_description to be more generic Jakub Narebski
@ 2011-04-29 17:52 ` Jakub Narebski
  2011-04-29 21:31 ` [PATCH 0/6] gitweb: Improve ctags, introduce categories Junio C Hamano
  6 siblings, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 17:52 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey,
	Jakub Narebski

From: Sebastien Cevey <seb@cine7.net>

This adds the $projects_list_group_categories option which, if enabled,
will result in grouping projects by category on the project list page.
The category is specified for each project by the $GIT_DIR/category file
or the 'gitweb.category' variable in its configuration file. By default,
projects are put in the $project_list_default_category category.

Note:
- Categories are always sorted alphabetically, with projects in
  each category sorted according to the globally selected $order.
- When displaying a subset of all the projects (page limiting), the
  category headers are only displayed for projects present on the page.

The feature is inspired from Sham Chukoury's patch for the XMMS2
gitweb, but has been rewritten for the current gitweb code. The CSS
for categories is inspired from Gustavo Sverzut Barbieri's patch to
group projects by path.

Thanks to Florian Ragwitz for Perl tips.

[jn: Updated to post restructuring projects list generation, fixed bugs,
 added very basic test in t9500 that there are no warnings from Perl.]

Signed-off-by: Sebastien Cevey <seb@cine7.net>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/README                          |   16 ++++++++
 gitweb/gitweb.perl                     |   62 ++++++++++++++++++++++++++++++--
 gitweb/static/gitweb.css               |    7 ++++
 t/t9500-gitweb-standalone-no-errors.sh |    8 ++++
 4 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/gitweb/README b/gitweb/README
index a92bde7..a3a697b 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -207,6 +207,15 @@ not include variables usually directly set during build):
    full description is available as 'title' attribute (usually shown on
    mouseover).  By default set to 25, which might be too small if you
    use long project descriptions.
+ * $projects_list_group_categories
+   Enables the grouping of projects by category on the project list page.
+   The category of a project is determined by the $GIT_DIR/category
+   file or the 'gitweb.category' variable in its repository configuration.
+   Disabled by default.
+ * $project_list_default_category
+   Default category for projects for which none is specified.  If set
+   to the empty string, such projects will remain uncategorized and
+   listed at the top, above categorized projects.
  * @git_base_url_list
    List of git base URLs used for URL to where fetch project from, shown
    in project summary page.  Full URL is "$git_base_url/$project".
@@ -314,6 +323,13 @@ You can use the following files in repository:
    from the template during repository creation. You can use the
    gitweb.description repo configuration variable, but the file takes
    precedence.
+ * category (or gitweb.category)
+   Singe line category of a project, used to group projects if
+   $projects_list_group_categories is enabled. By default (file and
+   configuration variable absent), uncategorized projects are put in
+   the $project_list_default_category category. You can use the
+   gitweb.category repo configuration variable, but the file takes
+   precedence.
  * cloneurl (or multiple-valued gitweb.url)
    File with repository URL (used for clone and fetch), one per line.
    Displayed in the project summary page. You can use multiple-valued
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e8685ac..f78fdd7 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -115,6 +115,14 @@ our $projects_list = "++GITWEB_LIST++";
 # the width (in characters) of the projects list "Description" column
 our $projects_list_description_width = 25;
 
+# group projects by category on the projects list
+# (enabled if this variable evaluates to true)
+our $projects_list_group_categories = 0;
+
+# default category if none specified
+# (leave the empty string for no category)
+our $project_list_default_category = "";
+
 # default order of projects list
 # valid values are none, project, descr, owner, and age
 our $default_projects_order = "project";
@@ -2584,6 +2592,12 @@ sub git_get_project_description {
 	return git_get_file_or_project_config($path, 'description');
 }
 
+sub git_get_project_category {
+	my $path = shift;
+	return git_get_file_or_project_config($path, 'category');
+}
+
+
 # supported formats:
 # * $GIT_DIR/ctags/<tagname> file (in 'ctags' subdirectory)
 #   - if its contents is a number, use it as tag weight,
@@ -4877,8 +4891,9 @@ sub git_patchset_body {
 
 # . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 
-# fills project list info (age, description, owner, forks) for each
-# project in the list, removing invalid projects from returned list
+# fills project list info (age, description, owner, category, forks)
+# for each project in the list, removing invalid projects from
+# returned list
 # NOTE: modifies $projlist, but does not remove entries from it
 sub fill_project_list_info {
 	my $projlist = shift;
@@ -4904,6 +4919,12 @@ sub fill_project_list_info {
 		if ($show_ctags) {
 			$pr->{'ctags'} = git_get_project_ctags($pr->{'path'});
 		}
+		if ($projects_list_group_categories && !defined $pr->{'category'}) {
+			my $cat = git_get_project_category($pr->{'path'}) ||
+			                                   $project_list_default_category;
+			$pr->{'category'} = to_utf8($cat);
+		}
+
 		push @projects, $pr;
 	}
 
@@ -4931,6 +4952,23 @@ sub sort_projects_list {
 	return @projects;
 }
 
+# returns a hash of categories, containing the list of project
+# belonging to each category
+sub build_projlist_by_category {
+	my ($projlist, $from, $to) = @_;
+	my %categories;
+
+	$from = 0 unless defined $from;
+	$to = $#$projlist if (!defined $to || $#$projlist < $to);
+
+	for (my $i = $from; $i <= $to; $i++) {
+		my $pr = $projlist->[$i];
+		push @{$categories{ $pr->{'category'} }}, $pr;
+	}
+
+	return wantarray ? %categories : \%categories;
+}
+
 # print 'sort by' <th> element, generating 'sort by $name' replay link
 # if that order is not selected
 sub print_sort_th {
@@ -5059,7 +5097,25 @@ sub git_project_list_body {
 		      "</tr>\n";
 	}
 
-	git_project_list_rows(\@projects, $from, $to, $check_forks);
+	if ($projects_list_group_categories) {
+		# only display categories with projects in the $from-$to window
+		@projects = sort {$a->{'category'} cmp $b->{'category'}} @projects[$from..$to];
+		my %categories = build_projlist_by_category(\@projects, $from, $to);
+		foreach my $cat (sort keys %categories) {
+			unless ($cat eq "") {
+				print "<tr>\n";
+				if ($check_forks) {
+					print "<td></td>\n";
+				}
+				print "<td class=\"category\" colspan=\"5\">".esc_html($cat)."</td>\n";
+				print "</tr>\n";
+			}
+
+			git_project_list_rows($categories{$cat}, undef, undef, $check_forks);
+		}
+	} else {
+		git_project_list_rows(\@projects, $from, $to, $check_forks);
+	}
 
 	if (defined $extra) {
 		print "<tr>\n";
diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css
index 79d7eeb..4df2d16 100644
--- a/gitweb/static/gitweb.css
+++ b/gitweb/static/gitweb.css
@@ -295,6 +295,13 @@ td.current_head {
 	text-decoration: underline;
 }
 
+td.category {
+	background-color: #d9d8d1;
+	border-top: 1px solid #000000;
+	border-left: 1px solid #000000;
+	font-weight: bold;
+}
+
 table.diff_tree span.file_status.new {
 	color: #008000;
 }
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 71ef0ac..f5648a6 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -644,4 +644,12 @@ test_expect_success \
 	'ctags: search projects by non existent tag' \
 	'gitweb_run "by_tag=non-existent"'
 
+# ----------------------------------------------------------------------
+# categories
+
+test_expect_success \
+	'categories: projects list, only default category' \
+	'echo "\$projects_list_group_categories = 1;" >>gitweb_config.perl &&
+	 gitweb_run'
+
 test_done
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] gitweb: Improve ctags, introduce categories
  2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
                   ` (5 preceding siblings ...)
  2011-04-29 17:52 ` [PATCH 6/6] gitweb: Optional grouping of projects by category Jakub Narebski
@ 2011-04-29 21:31 ` Junio C Hamano
  2011-04-29 23:53   ` Jakub Narebski
  2011-04-30 20:36   ` Øyvind A. Holm
  6 siblings, 2 replies; 13+ messages in thread
From: Junio C Hamano @ 2011-04-29 21:31 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: git, John 'Warthog9' Hawley,
	John 'Warthog9' Hawley, Uwe Kleine-Koenig,
	Jonathan Nieder, Petr Baudis, Sebastien Cevey

A tangent. It is curious why [PATCH 2/6] alone ended up with an encoded
"Subject" header, like this:

  Subject: =?UTF-8?q?=5BPATCH=202/6=5D=20gitweb=3A=20Change=20the=20
   way=20=22content=20tags=22=20=28=27ctags=27=29=20are=20handled?=

The message actually has the above as a long single line, as can be seen
at http://article.gmane.org/gmane.comp.version-control.git/172479/raw

Just being curious.

The headers suggest that sending MUA was git-send-email speaking to gmail
SMTP.  Did we introduce bugs to send-email recently?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] gitweb: Improve ctags, introduce categories
  2011-04-29 21:31 ` [PATCH 0/6] gitweb: Improve ctags, introduce categories Junio C Hamano
@ 2011-04-29 23:53   ` Jakub Narebski
  2011-04-30 20:36   ` Øyvind A. Holm
  1 sibling, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 23:53 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, John 'Warthog9' Hawley,
	John 'Warthog9' Hawley, Uwe Kleine-Koenig,
	Jonathan Nieder, Petr Baudis, Sebastien Cevey

Junio C Hamano wrote:

> A tangent. It is curious why [PATCH 2/6] alone ended up with an encoded
> "Subject" header, like this:
> 
>   Subject: =?UTF-8?q?=5BPATCH=202/6=5D=20gitweb=3A=20Change=20the=20
>    way=20=22content=20tags=22=20=28=27ctags=27=29=20are=20handled?=
> 
> The message actually has the above as a long single line, as can be seen
> at http://article.gmane.org/gmane.comp.version-control.git/172479/raw
> 
> Just being curious.
> 
> The headers suggest that sending MUA was git-send-email speaking to gmail
> SMTP.  Did we introduce bugs to send-email recently?

This is git-send-email from git version 1.7.3 (not most recent).

What might be important, and what you can't get from mail itself, is that
patch was generated using git-format-patch, but UTF-8 characters in body
of email were introduced during editing it, and were not present in commit
message (sorry, bad practice).  So git-send-email asked about encoding,
and I chosen default UTF-8

  The following files are 8bit, but do not declare a Content-Transfer-Encoding.
      mdir.7/0002-gitweb-Change-the-way-content-tags-ctags-are-handled.txt
  Which 8bit encoding should I declare [UTF-8]? <ENTER>

Though subject itself does contain only 7bit US-ASCII, so there is no need
for encoding it (with quoted-printable, as Subject appears before 
Content-Encoding and MIME-Version headers).

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] gitweb: Improve ctags, introduce categories
  2011-04-29 21:31 ` [PATCH 0/6] gitweb: Improve ctags, introduce categories Junio C Hamano
  2011-04-29 23:53   ` Jakub Narebski
@ 2011-04-30 20:36   ` Øyvind A. Holm
  2011-05-03 14:02     ` git-send-email and non 7bit clean message (was: [PATCH 0/6] gitweb: Improve ctags, introduce categories) Jakub Narebski
  1 sibling, 1 reply; 13+ messages in thread
From: Øyvind A. Holm @ 2011-04-30 20:36 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jakub Narebski, git, John 'Warthog9' Hawley,
	John 'Warthog9' Hawley, Uwe Kleine-Koenig,
	Jonathan Nieder, Petr Baudis, Sebastien Cevey

On 29 April 2011 23:31, Junio C Hamano <gitster@pobox.com> wrote:
> A tangent. It is curious why [PATCH 2/6] alone ended up with an encoded
> "Subject" header, like this:
>
>  Subject: =?UTF-8?q?=5BPATCH=202/6=5D=20gitweb=3A=20Change=20the=20
>   way=20=22content=20tags=22=20=28=27ctags=27=29=20are=20handled?=
>
> The message actually has the above as a long single line, as can be seen at
> http://article.gmane.org/gmane.comp.version-control.git/172479/raw
>
> Just being curious.

This seems as the same thing that I reported on 2010-04-25 23:35:49Z,
<http://thread.gmane.org/gmane.comp.version-control.git/145774>. If there's a
character above U+007F in the log message below line #2, the Subject: line is
garbled. In this case it is, it's the "ö" in Uwe's name that leads to this
error.

A test to reproduce this is at <https://gist.github.com/378785>, but it seems
as this was fixed between v1.7.4.1-292-ge2a57aa and v1.7.4.1-343-ga91df69 ,
probably happened in dc7f96f (Merge branch 'jk/format-patch-multiline-header').
The patch at <http://article.gmane.org/gmane.comp.version-control.git/172479/raw>
was generated with git-1.7.3, so it would trigger the error in this case.

Regards,
Øyvind

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git-send-email and non 7bit clean message (was: [PATCH 0/6] gitweb: Improve ctags, introduce categories)
  2011-04-30 20:36   ` Øyvind A. Holm
@ 2011-05-03 14:02     ` Jakub Narebski
  2011-05-04 13:50       ` [PATCH/RFC] git-send-email: Do not encode Subject if not required (was: git-send-email and non 7bit clean message) Jakub Narebski
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Narebski @ 2011-05-03 14:02 UTC (permalink / raw)
  To: Øyvind A. Holm
  Cc: Junio C Hamano, git, John 'Warthog9' Hawley,
	John 'Warthog9' Hawley, Uwe Kleine-Koenig,
	Jonathan Nieder, Petr Baudis, Sebastien Cevey, Greg Kroah-Hartman,
	Ryan Anderson

On Sat, 30 Apr 2011, Øyvind A. Holm wrote:
> On 29 April 2011 23:31, Junio C Hamano <gitster@pobox.com> wrote:
> > A tangent. It is curious why [PATCH 2/6] alone ended up with an encoded
> > "Subject" header, like this:
> >
> >  Subject: =?UTF-8?q?=5BPATCH=202/6=5D=20gitweb=3A=20Change=20the=20
> >   way=20=22content=20tags=22=20=28=27ctags=27=29=20are=20handled?=
> >
> > The message actually has the above as a long single line, as can be seen at
> > http://article.gmane.org/gmane.comp.version-control.git/172479/raw
> >
> > Just being curious.
> 
> This seems as the same thing that I reported on 2010-04-25 23:35:49Z,
> <http://thread.gmane.org/gmane.comp.version-control.git/145774>. If there's a
> character above U+007F in the log message below line #2, the Subject: line is
> garbled. In this case it is, it's the "ö" in Uwe's name that leads to this
> error.
> 
> A test to reproduce this is at <https://gist.github.com/378785>, but it seems
> as this was fixed between v1.7.4.1-292-ge2a57aa and v1.7.4.1-343-ga91df69 ,
> probably happened in dc7f96f (Merge branch 'jk/format-patch-multiline-header').
> The patch at <http://article.gmane.org/gmane.comp.version-control.git/172479/raw>
> was generated with git-1.7.3, so it would trigger the error in this case.

I have just upgraded git to 1.7.5, and unfortunately it still has the
same bug (note that UTF-8 character was introduced while editing patch,
so git-format-patch doesn't see it):

  5369:[gitweb/web@git]# git send-email [...] --dry-run mdir.1/*.txt
  The following files are 8bit, but do not declare a Content-Transfer-Encoding.
      mdir.1/0001-gitweb-Prepare-for-splitting-gitweb.txt
  Which 8bit encoding should I declare [UTF-8]? <ENTER>
  Dry-Sent [PATCHv2 0/2] gitweb: Beginnings of splitting gitweb into modules
  Dry-Sent =?UTF-8?q?=5BPATCHv2=201/2=20=28RFC=3F=29=5D=20gitweb=3A=20Prepare=20for=20splitting=20gitweb?=
  Dry-Sent [PATCHv2 2/2 (PoC)] gitweb: Create Gitweb::Util module

Note that having

  MIME-Version: 1.0
  Content-Type: text/plain; charset=utf-8

is not enough[1].  However the patch send is understood correctly by email
programs: see [PATCH 2/6] in this thread.

I have added 

  Content-Transfer-Encoding: 8bit

to mdir.1/0001-gitweb-Prepare-for-splitting-gitweb.txt, and now it works
all right.

  5370:[gitweb/web@git]# git send-email [...] mdir.1/*.txt
  Dry-Sent [PATCHv2 0/2] gitweb: Beginnings of splitting gitweb into modules
  Dry-Sent [PATCHv2 1/2 (RFC?)] gitweb: Prepare for splitting gitweb
  Dry-Sent [PATCHv2 2/2 (PoC)] gitweb: Create Gitweb::Util module


Footnotes:
^^^^^^^^^^
[1]: Note that git-send-email does something strange: first, the problem
is with Content-Transfer-Encoding, and git-send-email asks for 8bit
encoding, suggesting UTF-8, instead of asking for transfer encoding,
sugesting e.g. 8bit.

Second, from email headers I have added git-send-email should _know_ that
message uses UTF-8 encoding (though this is side issue, and probably result
of above).

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH/RFC] git-send-email: Do not encode Subject if not required (was: git-send-email and non 7bit clean message)
  2011-05-03 14:02     ` git-send-email and non 7bit clean message (was: [PATCH 0/6] gitweb: Improve ctags, introduce categories) Jakub Narebski
@ 2011-05-04 13:50       ` Jakub Narebski
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-05-04 13:50 UTC (permalink / raw)
  To: Øyvind A. Holm
  Cc: Junio C Hamano, git, Greg Kroah-Hartman, Ryan Anderson

On Tue, 3 May 2011, Jakub Narebski wrote:
> On Sat, 30 Apr 2011, Øyvind A. Holm wrote:
> > On 29 April 2011 23:31, Junio C Hamano <gitster@pobox.com> wrote:

> > > A tangent. It is curious why [PATCH 2/6] alone ended up with an encoded
> > > "Subject" header, like this:
> > >
> > >  Subject: =?UTF-8?q?=5BPATCH=202/6=5D=20gitweb=3A=20Change=20the=20
> > >   way=20=22content=20tags=22=20=28=27ctags=27=29=20are=20handled?=
> > >
> > > The message actually has the above as a long single line, as can be seen at
> > > http://article.gmane.org/gmane.comp.version-control.git/172479/raw
> > >
> > > Just being curious.
> > 
> > This seems as the same thing that I reported on 2010-04-25 23:35:49Z,
> > <http://thread.gmane.org/gmane.comp.version-control.git/145774>. If there's a
> > character above U+007F in the log message below line #2, the Subject: line is
> > garbled. In this case it is, it's the "ö" in Uwe's name that leads to this
> > error.

Note: this line was added "by hand" when editing patch, so commit message
that git-format-patch seen did not contain any non-ASCII characters.
Otherwise fir-format-patch would add correct MIME headers itself.

> > A test to reproduce this is at <https://gist.github.com/378785>, but it seems
> > as this was fixed between v1.7.4.1-292-ge2a57aa and v1.7.4.1-343-ga91df69 ,
> > probably happened in dc7f96f (Merge branch 'jk/format-patch-multiline-header').
> > The patch at <http://article.gmane.org/gmane.comp.version-control.git/172479/raw>
> > was generated with git-1.7.3, so it would trigger the error in this case.
> 
> I have just upgraded git to 1.7.5, and unfortunately it still has the
> same bug (note that UTF-8 character was introduced while editing patch,
> so git-format-patch doesn't see it):
> 
>   5369:[gitweb/web@git]# git send-email [...] --dry-run mdir.1/*.txt
>   The following files are 8bit, but do not declare a Content-Transfer-Encoding.
>       mdir.1/0001-gitweb-Prepare-for-splitting-gitweb.txt
>   Which 8bit encoding should I declare [UTF-8]? <ENTER>
>   Dry-Sent [PATCHv2 0/2] gitweb: Beginnings of splitting gitweb into modules
>   Dry-Sent =?UTF-8?q?=5BPATCHv2=201/2=20=28RFC=3F=29=5D=20gitweb=3A=20Prepare=20for=20splitting=20gitweb?=
>   Dry-Sent [PATCHv2 2/2 (PoC)] gitweb: Create Gitweb::Util module
> 
> Note that having
> 
>   MIME-Version: 1.0
>   Content-Type: text/plain; charset=utf-8
> 
> is not enough[1].
 
Perhaps something like attached patch could be beginning of fixing
this issue.
 
> Footnotes:
> ^^^^^^^^^^
> [1]: Note that git-send-email does something strange: first, the problem
> is with Content-Transfer-Encoding, and git-send-email asks for 8bit
> encoding, suggesting UTF-8, instead of asking for transfer encoding,
> sugesting e.g. 8bit.

Actually git-send-email assumes 8bit if not stated, but it does not
parse 'Content-Type' header for encoding to be used, perhaps correctly
assuming that if required 'Content-Transfer-Encoding' is missing, then
'Content-Type' is most probably missing as well.

-- >8 ---------- >8 --
Subject: [PATCH] git-send-email: Do not encode Subject if not required

If "Subject:" header of an email does not contain non-ASCII characters,
then we don't need to RFC-2047 encode it, even if file had broken
encoding.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 git-send-email.perl |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 98ab33a..1c6b1a8 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1281,7 +1281,9 @@ foreach my $t (@files) {
 		$body_encoding = $auto_8bit_encoding;
 	}
 
-	if ($broken_encoding{$t} && !is_rfc2047_quoted($subject)) {
+	if ($broken_encoding{$t} &&
+	    !is_rfc2047_quoted($subject) &&
+	    $subject =~ /[^[:ascii:]]/) {
 		$subject = quote_rfc2047($subject, $auto_8bit_encoding);
 	}
 
-- 
1.7.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/6 (v2)] gitweb: Restructure projects list generation
  2011-04-29 17:51 ` [PATCH 1/6 (v2)] gitweb: Restructure projects list generation Jakub Narebski
@ 2011-05-07 18:39   ` Jakub Narebski
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-05-07 18:39 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey

[-- Attachment #1: Type: text/plain, Size: 6494 bytes --]

On Fri, 29 Apr 2011, Jakub Narebski wrote:

> Extract filtering out forks (which is done if 'forks' feature is
> enabled) into filter_forks_from_projects_list subroutine, and
> searching projects (via projects search form, or via content tags)
> into search_projects_list subroutine.

If it was all this patch did, it would be probably easier to review.
Should I then split this patch into smaller, though not entirely
standalone patches: pure refactoring, improving finding forks, and
adding tests?

> Sorting projects got also refactored in a very straight way (just
> moving code) into sort_projects_list subroutine.

This also could be refactored in a separate patch... though I rather
have all refactorings together, so one can see the result: clean code.
 
> The interaction between forks, content tags and searching is now made
> more explicit: searching whether by tag, or via search form turns off
> fork filtering (gitweb searches also forks, and will show all
> results).  If 'ctags' feature is disabled, then searching by tag is
> too.

This is functional change that makes code more clear, and I think it
makes better behavior... but it changes how gitweb behaves, so it is
not pure refactoring.  Should I put it into a bit artificial separate
patch?
 
> While at it we now detect that there are no projects and respond with
> "404 No projects found" also for 'project_index' and 'opml' actions.

This "while at it" is here because we can now say "No projects found"
for 'projects_list' action, because filtering / searching is done
upfront, so we know if we found anything.

But again, this perhaps would be better put in a separate patch.
 
[...]
> filter_forks_from_projects_list() uses trie[1] (prefix tree) to find
> which repositories are forks of other projects; directories are used
> as "letters" in trie.  Algorithm to create trie and find prefix match
> was created with some help of code of Tree::Tree[2] Perl module from
> CPAN.
> 
> [1]: http://en.wikipedia.org/wiki/Trie
> [2]: http://p3rl.org/Tree::Trie

This change in algorithm should perhaps be put in a separate patch...

> Note that before this commit filtering out forks worked only for
> $projects_list being a file, it assumed that forks are always after
> forked project (not very unreasonable), and used algorithm which is
> O(<projects>^2).
> 
> Current code filters out projects only where it should, regardless
> whether $projects_list is directory or a file, and trie-based
> algorithm is O(<projects>) + O(<projects forked>*<max depth>).
> Current code also consistently finds shortest prefix (important in
> counting forks in case of forks of forks).

Jakub Narebski wrote in
"Re: What's cooking in git.git (May 2011, #02; Wed, 4)"
http://thread.gmane.org/gmane.comp.version-control.git/172789/focus=172820

> Junio C Hamano <gitster@pobox.com> writes:
[...]
>> * jn/ctags (2011-04-29) 6 commits
>>  - gitweb: Optional grouping of projects by category
>>  - gitweb: Modularized git_get_project_description to be more generic
>>  - gitweb: Split git_project_list_body in two functions
>>  - gitweb: Mark matched 'ctag' / contents tag (?by_tag=foo)
>>  - gitweb: Change the way "content tags" ('ctags') are handled
>>  - gitweb: Restructure projects list generation
>> 
>> Waiting for comments.
> 
> Should I do and post benchmarks for
> 
>    - gitweb: Restructure projects list generation
> 
> change (when 'forks' feature is used)?

It is not easy to compare old and new code, because those algorithm
have different behavior.

Best I can come up with is comparing pure fork detection (not the time
it takes to generate page) - I can easy mock that - for sizes comparable
with http://repo.or.cz which uses 'forks' feature.  repo.or.cz has
around 4,000 projects, around 300 of those have forks (note: forks are
not included in number of projects).  From non-representative sample
most forked project have but single fork; note however that there are
repositories such as git.git, which has 55 forks, and some of those have
forks on its own (forks of forks).

 plan: nprojects=4000; nprojects_forked=300; nforks_per_forked_project=1
 number of projects: 4000

 Benchmark: running old, new for at least 600 CPU seconds...
        old: 1231.38 wallclock secs (620.23 usr +  1.30 sys = 621.53 CPU) @  0.03/s (n=21)
        new:  847.6  wallclock secs (632.34 usr +  0.80 sys = 633.14 CPU) @  5.18/s (n=3277)
     s/iter    old    new
 old   29.6     --   -99%
 new  0.193 15219%     --

 Dumbbench: running old, new for at least 50 iterations...
 old: Ran 106 iterations (25 outliers).
 old: Rounded run time per iteration: 3.316e+01  +/- 1.5e-01 (0.5%)
 new: Ran 66 iterations (15 outliers).
 new: Rounded run time per iteration: 2.0213e-01 +/- 2.1e-04 (0.1%)

Below there is table of time versus number of projects, used to generate
attached and linked images.  Data was generated using Dumbbench module
from CPAN.

 ########################################################
 # nforks  = 5 (forks per project forked)
 # nforked = 5.00%
 # 1:projects 2:nforked  3:old 4:old_err  5:new 5:new_err
 #  0   0  8.959e-06  2.2e-08  1.0137e-05 2.0e-08
   50   3  3.211e-03  1.6e-05  2.0524e-03 5.0e-06
  103   5  1.4003e-02 6.7e-05  4.477e-03  2.2e-05
  204  10  5.647e-02  2.3e-04  9.464e-03  4.6e-05
  402  20  2.1939e-01 3.2e-04  1.9943e-02 9.6e-05
  500  25  3.666e-01  3.6e-03  2.706e-02  2.5e-04
  800  40  9.124e-01  1.7e-03  4.175e-02  2.1e-04
 1000  50  1.575e+00  1.3e-02  5.896e-02  5.3e-04
 1600  80  3.738e+00  1.2e-02  9.347e-02  4.3e-04
 2004 100  6.820e+00  6.7e-02  1.1216e-01 3.6e-04
 3203 160  1.8189e+01 8.5e-02  1.8144e-01 4.5e-04
 4003 200  2.927e+01  2.9e-01  2.2939e-01 5.7e-04

Script used to generate this data is attached.

From those data we can generate a plot to show how time depends on size
of problem (on total number of projects).  Assuming that both "old" ane
"new" are P i.e. are O(n^k), where 'n' is number of projects, this
dependency is best examined on loglog scale.

      y = x^k
  log y = log (x^k) = k * log x
      v = k * u,  where u = log x, v = log y

This plot, together with fit of cubic polynomial and linear function
to the data, is shown in

  https://git.wiki.kernel.org/index.php/File:Benchmark_find_forks-old_vs_new.png

(I'm sorry for abusing git wiki for sharing images...).
-- 
Jakub Narebski
Poland

[-- Attachment #2: benchmark_find_forks.perl --]
[-- Type: text/plain, Size: 6721 bytes --]

#!/usr/bin/perl

use strict;
use warnings;

use List::Util qw(first);
use POSIX qw(ceil);

use Acme::MetaSyntactic qw(any);
use Benchmark qw(:all :hireswallclock);
use Dumbbench;

use Data::Dumper::Concise; # for debugging


# ----------------------------------------------------------------------

# old: version used to be in git_get_projects_list for $projects_list file
sub filter__git_get_projects_list {
	my ($projects, $need_sorting) = @_;

	@$projects = sort { $a->{'path'} cmp $b->{'path'} } @$projects
		if ($need_sorting);

	my (@filtered, %paths);
 PROJECT:
	foreach my $pr (@$projects) {
		my $path = $pr->{'path'};

		# below is v1.7.5.1:gitweb/gitweb.perl lines 2725 and later
	PATH:
		foreach my $filter (keys %paths) {
			# looking for forks
			my $pfx = substr($path, 0, length($filter));
			if ($pfx ne $filter) {
				next PATH;
			}
			my $sfx = substr($path, length($filter));
			if ($sfx !~ /^\/.*\.git$/) {
				next PATH;
			}
			# is a fork, don't include it in
			# the list
			next PROJECT;
		}

		# [...]
		# below is v1.7.5.1:gitweb/gitweb.perl line 2746 and later
		push @filtered, $pr;
		(my $forks_path = $path) =~ s/\.git$//;
		$paths{$forks_path}++;
	}

	return @filtered;
}


# new: using 'path' only, using trie; code borrowed from Tree::Trie (Perl Artistic License)
sub filter_forks_from_projects_list {
	my ($projects, $need_sorting) = @_;

	@$projects = sort { $a->{'path'} cmp $b->{'path'} } @$projects
		if ($need_sorting);

	my %trie; # prefix tree of directories (path components)
	# generate trie out of those directories that might contain forks
	foreach my $pr (@$projects) {
		my $path = $pr->{'path'};
		$path =~ s/\.git$//;      # forks of 'repo.git' are in 'repo/' directory
		next if ($path =~ m!/$!); # skip non-bare repositories, e.g. 'repo/.git'
		next unless ($path);      # skip '.git' repository: tests, git-instaweb
		#next unless (-d $path);  # containing directory exists (NOT IN TESTS!!!)
		$pr->{'forks'} = [];      # there can be 0 or more forks of project

		# add to trie
		my @dirs = split('/', $path);
		# walk the trie, until either runs out of components or out of trie
		my $ref = \%trie;
		while (scalar @dirs &&
		       exists($ref->{$dirs[0]})) {
			$ref = $ref->{shift @dirs};
		}
		# create rest of trie structure from rest of components
		foreach my $dir (@dirs) {
			$ref = $ref->{$dir} = {};
		}
		# create end marker, store $pr as a data
		$ref->{''} = $pr if (!exists $ref->{''});
	}

	# filter out forks, by finding shortest prefix match for paths
	my @filtered;
 PROJECT:
	foreach my $pr (@$projects) {
		# trie lookup
		my $ref = \%trie;
	DIR:
		foreach my $dir (split('/', $pr->{'path'})) {
			if (exists $ref->{''}) {
				# found [shortest] prefix, is a fork - skip it
				push @{$ref->{''}{'forks'}}, $pr;
				next PROJECT;
			}
			if (!exists $ref->{$dir}) {
				# not in trie, cannot have prefix, not a fork
				push @filtered, $pr;
				next PROJECT;
			}
			# If the dir is there, we just walk one step down the trie.
			$ref = $ref->{$dir};
		}
		# we ran out of trie
		# (shouldn't happen: it's either no match, or end marker)
		push @filtered, $pr;
	}

	return @filtered;
}

# ----------------------------------------------------------------------

sub prepare_projects {
	my ($nprojects, $nforked, $nforks_per_project, $exclude_forks) = @_;
	my @projects;

	while ($nprojects > 0) {
		push @projects, metaname().".git";
		$nprojects--;
		last unless $nprojects;

		if (rand() < (1.0*$nforked/$nprojects)) {
			(my $base = $projects[-1]) =~ s/\.git$//;
			push @projects, map { "$base/$_.git" } metaname($nforks_per_project);
			$nprojects -= $nforks_per_project unless $exclude_forks;
			$nforked--;
		}
	}

	#print Dumper(\@projects);
	@projects = map { { 'path' => $_ } } @projects;

	return @projects;
}

# ######################################################################

my (@projects, @filtered);

# repo.or.cz
# * 4038 projects (without forks?), based on http://repo.or.cz/w?a=project_index
# * 4024 projects, based on "grep -c -e '<tr class="'" on http://repo.or.cz/w?a=project_list
# * 3428 active projects, based on "grep -c -e '"age'" on http://repo.or.cz/w?a=project_list
# *  308 projects have forks (7.6-10%), based on grep -c -e '/forks">+</a>' project_list.html
# * nonrepresentative sample of nforks: 6, 2, 1, 1, 0, 0, 1, 0, 1, 2, 3, 3, 3, 55+, 1, 1, 1

my $nprojects = 100;
my $nforked_frac = 0.05;
my $nforked   = ceil($nforked_frac*$nprojects); # 5%
my $nforks    = 5;

$nprojects = 4000;
$nforked   = 300;
$nforks    = 1;

print "plan: nprojects=$nprojects; nforked=$nforked; nforks=$nforks\n";
@projects = prepare_projects($nprojects, $nforked, $nforks);
print "number of projects: ".(scalar @projects)."\n";

print "\n";
my $results = timethese(-600, {
	'old' => sub { filter__git_get_projects_list(\@projects) },
	'new' => sub { filter_forks_from_projects_list(\@projects) },
});
#print Dumper($results);
cmpthese($results);
print "\n";

print "Dumbbench: old, new\n";
my $bench = Dumbbench->new(
	target_rel_precision => 0.005, # seek ~0.5%
	initial_runs         => 50,    # the higher the more reliable
);
$bench->add_instances(
	Dumbbench::Instance::PerlSub->new(
		name => 'old',
		code => sub { filter__git_get_projects_list(\@projects) },
	),
	Dumbbench::Instance::PerlSub->new(
		name => 'new',
		code => sub { filter_forks_from_projects_list(\@projects) },
	),
);
$bench->run();
$bench->report();

print "\n";
#print Dumper($bench->instances());
__END__

print "#####################################################################\n".
      "# nforks  = $nforks (forks per project forked)\n".
      "# nforked = ".(sprintf "%.2f%%", $nforked_frac*100.0)."\n".
      "# 1:projects 2:nforked  3:old 4:old_err  5:new 5:new_err\n";
for (my $i = 50; $i <= 4000; $i *= 2) {
	$nforked  = ceil($nforked_frac*$i);
	@projects = prepare_projects($i, $nforked, $nforks);

	$bench = Dumbbench->new(
		target_rel_precision => 0.005, # seek ~0.5%
		initial_runs         => 50,    # the higher the more reliable
	);
	$bench->add_instances(
		Dumbbench::Instance::PerlSub->new(
			name => 'old',
			code => sub { filter__git_get_projects_list(\@projects) },
		),
		Dumbbench::Instance::PerlSub->new(
			name => 'new',
			code => sub { filter_forks_from_projects_list(\@projects) },
		),
	);
	$bench->run();

	my %instances = map { $_->name() => $_ } $bench->instances();

	printf "%4d %3d ", (scalar @projects), $nforked;
	foreach my $name (qw(old new)) {
		my $result = $instances{$name}->result();
		print " ".$result->number();
		print " ".$result->error()->[0];
	}
	print "\n";
}
#print Dumper($bench->instances());
#print Dumper($_->result()) foreach ($bench->instances());


print "\n";


1;
__END__

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-05-07 19:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
2011-04-29 17:51 ` [PATCH 1/6 (v2)] gitweb: Restructure projects list generation Jakub Narebski
2011-05-07 18:39   ` Jakub Narebski
2011-04-29 17:51 ` [PATCH 2/6] gitweb: Change the way "content tags" ('ctags') are handled Jakub Narebski
2011-04-29 17:51 ` [PATCH 3/6] gitweb: Mark matched 'ctag' / contents tag (?by_tag=foo) Jakub Narebski
2011-04-29 17:51 ` [PATCH 4/6] gitweb: Split git_project_list_body in two functions Jakub Narebski
2011-04-29 17:52 ` [PATCH 5/6] gitweb: Modularized git_get_project_description to be more generic Jakub Narebski
2011-04-29 17:52 ` [PATCH 6/6] gitweb: Optional grouping of projects by category Jakub Narebski
2011-04-29 21:31 ` [PATCH 0/6] gitweb: Improve ctags, introduce categories Junio C Hamano
2011-04-29 23:53   ` Jakub Narebski
2011-04-30 20:36   ` Øyvind A. Holm
2011-05-03 14:02     ` git-send-email and non 7bit clean message (was: [PATCH 0/6] gitweb: Improve ctags, introduce categories) Jakub Narebski
2011-05-04 13:50       ` [PATCH/RFC] git-send-email: Do not encode Subject if not required (was: git-send-email and non 7bit clean message) Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).