* [PATCH 1/6 (v2)] gitweb: Restructure projects list generation
2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
@ 2011-04-29 17:51 ` Jakub Narebski
2011-05-07 18:39 ` Jakub Narebski
2011-04-29 17:51 ` [PATCH 2/6] gitweb: Change the way "content tags" ('ctags') are handled Jakub Narebski
` (5 subsequent siblings)
6 siblings, 1 reply; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 17:51 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey,
Jakub Narebski
Extract filtering out forks (which is done if 'forks' feature is
enabled) into filter_forks_from_projects_list subroutine, and
searching projects (via projects search form, or via content tags)
into search_projects_list subroutine.
Both are now run _before_ displaying projects, and not while printing;
this allow to know upfront if there were any found projects. Gitweb
now can and do print 'No such projects found' if user searches for
phrase which does not correspond to any project (any repository).
This also would allow splitting projects list into pages, if we so
desire.
Filtering out forks and marking repository (project) as having forks
is now consolidated into one subroutine (special case of handling
forks in git_get_projects_list only for $projects_list being file is
now removed). Forks handling is also cleaned up and simplified.
$pr->{'forks'} now contains un-filled list of forks; we can now also
detect situation where the way for having forks is prepared, but there
are no forks yet.
Sorting projects got also refactored in a very straight way (just
moving code) into sort_projects_list subroutine.
The interaction between forks, content tags and searching is now made
more explicit: searching whether by tag, or via search form turns off
fork filtering (gitweb searches also forks, and will show all
results). If 'ctags' feature is disabled, then searching by tag is
too.
The t9500 test now includes some basic test for 'forks' and 'ctags'
features; the t9502 includes test checking if gitweb correctly filters
out forks.
Generating list of projects by scanning given directory is now also a
bit simplified wrt. handling filtering; it is byproduct of extracting
filtering forks to separate subroutine.
While at it we now detect that there are no projects and respond with
"404 No projects found" also for 'project_index' and 'opml' actions.
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch was sent to git mailing list as
[PATCHv2/RFC] gitweb: Restructure projects list generation
http://thread.gmane.org/gmane.comp.version-control.git/167996/focus=168321
The main difference from previous version
[PATCH/RFC] gitweb: Restructure projects list generation
http://thread.gmane.org/gmane.comp.version-control.git/167996
is that newly introduced filter_forks_from_projects_list() no longer
relies on the assumption that forked project immediately precedes all
of its forks... which is not true when $porjects_list is a directory
and we have e.g. such situation:
repo.git
repo.related.git
repo_other.git
repo/fork.git
This issue was noticed by Jonathan Nieder.
filter_forks_from_projects_list() uses trie[1] (prefix tree) to find
which repositories are forks of other projects; directories are used
as "letters" in trie. Algorithm to create trie and find prefix match
was created with some help of code of Tree::Tree[2] Perl module from
CPAN.
[1]: http://en.wikipedia.org/wiki/Trie
[2]: http://p3rl.org/Tree::Trie
This revision includes additionally tests in t9502 that correct
projects are filtered.
Note that before this commit filtering out forks worked only for
$projects_list being a file, it assumed that forks are always after
forked project (not very unreasonable), and used algorithm which is
O(<projects>^2).
Current code filters out projects only where it should, regardless
whether $projects_list is directory or a file, and trie-based
algorithm is O(<projects>) + O(<projects forked>*<max depth>).
Current code also consistently finds shortest prefix (important in
counting forks in case of forks of forks).
There remains a question whether $pr->{'forks'} should be a list of
forks, or would having $pr->{'forks'} being number of forks be enough.
Refactoring sort_projects_list and fixing handling of $project_maxdepth
could be done in a separate patches.
gitweb/gitweb.perl | 250 ++++++++++++++++++++---------
t/t9500-gitweb-standalone-no-errors.sh | 49 ++++++
t/t9502-gitweb-standalone-parse-output.sh | 74 +++++++++
3 files changed, 296 insertions(+), 77 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index ee69ea6..014b33b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2651,21 +2651,23 @@ sub git_get_project_url_list {
}
sub git_get_projects_list {
- my ($filter) = @_;
+ my $filter = shift || '';
my @list;
- $filter ||= '';
$filter =~ s/\.git$//;
- my $check_forks = gitweb_check_feature('forks');
-
if (-d $projects_list) {
# search in directory
- my $dir = $projects_list . ($filter ? "/$filter" : '');
+ my $dir = $projects_list;
# remove the trailing "/"
$dir =~ s!/+$!!;
- my $pfxlen = length("$dir");
- my $pfxdepth = ($dir =~ tr!/!!);
+ my $pfxlen = length("$projects_list");
+ my $pfxdepth = ($projects_list =~ tr!/!!);
+ # when filtering, search only given subdirectory
+ if ($filter) {
+ $dir .= "/$filter";
+ $dir =~ s!/+$!!;
+ }
File::Find::find({
follow_fast => 1, # follow symbolic links
@@ -2680,14 +2682,14 @@ sub git_get_projects_list {
# only directories can be git repositories
return unless (-d $_);
# don't traverse too deep (Find is super slow on os x)
+ # $project_maxdepth excludes depth of $projectroot
if (($File::Find::name =~ tr!/!!) - $pfxdepth > $project_maxdepth) {
$File::Find::prune = 1;
return;
}
- my $subdir = substr($File::Find::name, $pfxlen + 1);
+ my $path = substr($File::Find::name, $pfxlen + 1);
# we check related file in $projectroot
- my $path = ($filter ? "$filter/" : '') . $subdir;
if (check_export_ok("$projectroot/$path")) {
push @list, { path => $path };
$File::Find::prune = 1;
@@ -2700,7 +2702,6 @@ sub git_get_projects_list {
# 'git%2Fgit.git Linus+Torvalds'
# 'libs%2Fklibc%2Fklibc.git H.+Peter+Anvin'
# 'linux%2Fhotplug%2Fudev.git Greg+Kroah-Hartman'
- my %paths;
open my $fd, '<', $projects_list or return;
PROJECT:
while (my $line = <$fd>) {
@@ -2711,32 +2712,9 @@ sub git_get_projects_list {
if (!defined $path) {
next;
}
- if ($filter ne '') {
- # looking for forks;
- my $pfx = substr($path, 0, length($filter));
- if ($pfx ne $filter) {
- next PROJECT;
- }
- my $sfx = substr($path, length($filter));
- if ($sfx !~ /^\/.*\.git$/) {
- next PROJECT;
- }
- } elsif ($check_forks) {
- PATH:
- foreach my $filter (keys %paths) {
- # looking for forks;
- my $pfx = substr($path, 0, length($filter));
- if ($pfx ne $filter) {
- next PATH;
- }
- my $sfx = substr($path, length($filter));
- if ($sfx !~ /^\/.*\.git$/) {
- next PATH;
- }
- # is a fork, don't include it in
- # the list
- next PROJECT;
- }
+ # if $filter is rpovided, check if $path begins with $filter
+ if ($filter && $path !~ m!^\Q$filter\E/!) {
+ next;
}
if (check_export_ok("$projectroot/$path")) {
my $pr = {
@@ -2744,8 +2722,6 @@ sub git_get_projects_list {
owner => to_utf8($owner),
};
push @list, $pr;
- (my $forks_path = $path) =~ s/\.git$//;
- $paths{$forks_path}++;
}
}
close $fd;
@@ -2753,6 +2729,98 @@ sub git_get_projects_list {
return @list;
}
+# written with help of Tree::Trie module (Perl Artistic License, GPL compatibile)
+# as side effects it sets 'forks' field to list of forks for forked projects
+sub filter_forks_from_projects_list {
+ my $projects = shift;
+
+ my %trie; # prefix tree of directories (path components)
+ # generate trie out of those directories that might contain forks
+ foreach my $pr (@$projects) {
+ my $path = $pr->{'path'};
+ $path =~ s/\.git$//; # forks of 'repo.git' are in 'repo/' directory
+ next if ($path =~ m!/$!); # skip non-bare repositories, e.g. 'repo/.git'
+ next unless ($path); # skip '.git' repository: tests, git-instaweb
+ next unless (-d $path); # containing directory exists
+ $pr->{'forks'} = []; # there can be 0 or more forks of project
+
+ # add to trie
+ my @dirs = split('/', $path);
+ # walk the trie, until either runs out of components or out of trie
+ my $ref = \%trie;
+ while (scalar @dirs &&
+ exists($ref->{$dirs[0]})) {
+ $ref = $ref->{shift @dirs};
+ }
+ # create rest of trie structure from rest of components
+ foreach my $dir (@dirs) {
+ $ref = $ref->{$dir} = {};
+ }
+ # create end marker, store $pr as a data
+ $ref->{''} = $pr if (!exists $ref->{''});
+ }
+
+ # filter out forks, by finding shortest prefix match for paths
+ my @filtered;
+ PROJECT:
+ foreach my $pr (@$projects) {
+ # trie lookup
+ my $ref = \%trie;
+ DIR:
+ foreach my $dir (split('/', $pr->{'path'})) {
+ if (exists $ref->{''}) {
+ # found [shortest] prefix, is a fork - skip it
+ push @{$ref->{''}{'forks'}}, $pr;
+ next PROJECT;
+ }
+ if (!exists $ref->{$dir}) {
+ # not in trie, cannot have prefix, not a fork
+ push @filtered, $pr;
+ next PROJECT;
+ }
+ # If the dir is there, we just walk one step down the trie.
+ $ref = $ref->{$dir};
+ }
+ # we ran out of trie
+ # (shouldn't happen: it's either no match, or end marker)
+ push @filtered, $pr;
+ }
+
+ return @filtered;
+}
+
+# note: fill_project_list_info must be run first,
+# for 'descr_long' and 'ctags' to be filled
+sub search_projects_list {
+ my ($projlist, %opts) = @_;
+ my $tagfilter = $opts{'tagfilter'};
+ my $searchtext = $opts{'searchtext'};
+
+ return @$projlist
+ unless ($tagfilter || $searchtext);
+
+ my @projects;
+ PROJECT:
+ foreach my $pr (@$projlist) {
+
+ if ($tagfilter) {
+ next unless ref($pr->{'ctags'}) eq 'HASH';
+ next unless
+ grep { lc($_) eq lc($tagfilter) } keys %{$pr->{'ctags'}};
+ }
+
+ if ($searchtext) {
+ next unless
+ $pr->{'path'} =~ /$searchtext/ ||
+ $pr->{'descr_long'} =~ /$searchtext/;
+ }
+
+ push @projects, $pr;
+ }
+
+ return @projects;
+}
+
our $gitweb_project_owner = undef;
sub git_get_project_list_from_file {
@@ -4742,7 +4810,7 @@ sub git_patchset_body {
# project in the list, removing invalid projects from returned list
# NOTE: modifies $projlist, but does not remove entries from it
sub fill_project_list_info {
- my ($projlist, $check_forks) = @_;
+ my $projlist = shift;
my @projects;
my $show_ctags = gitweb_check_feature('ctags');
@@ -4762,23 +4830,36 @@ sub fill_project_list_info {
if (!defined $pr->{'owner'}) {
$pr->{'owner'} = git_get_project_owner("$pr->{'path'}") || "";
}
- if ($check_forks) {
- my $pname = $pr->{'path'};
- if (($pname =~ s/\.git$//) &&
- ($pname !~ /\/$/) &&
- (-d "$projectroot/$pname")) {
- $pr->{'forks'} = "-d $projectroot/$pname";
- } else {
- $pr->{'forks'} = 0;
- }
+ if ($show_ctags) {
+ $pr->{'ctags'} = git_get_project_ctags($pr->{'path'});
}
- $show_ctags and $pr->{'ctags'} = git_get_project_ctags($pr->{'path'});
push @projects, $pr;
}
return @projects;
}
+sub sort_projects_list {
+ my ($projlist, $order) = @_;
+ my @projects;
+
+ my %order_info = (
+ project => { key => 'path', type => 'str' },
+ descr => { key => 'descr_long', type => 'str' },
+ owner => { key => 'owner', type => 'str' },
+ age => { key => 'age', type => 'num' }
+ );
+ my $oi = $order_info{$order};
+ return @$projlist unless defined $oi;
+ if ($oi->{'type'} eq 'str') {
+ @projects = sort {$a->{$oi->{'key'}} cmp $b->{$oi->{'key'}}} @$projlist;
+ } else {
+ @projects = sort {$a->{$oi->{'key'}} <=> $b->{$oi->{'key'}}} @$projlist;
+ }
+
+ return @projects;
+}
+
# print 'sort by' <th> element, generating 'sort by $name' replay link
# if that order is not selected
sub print_sort_th {
@@ -4805,28 +4886,39 @@ sub format_sort_th {
sub git_project_list_body {
# actually uses global variable $project
my ($projlist, $order, $from, $to, $extra, $no_header) = @_;
+ my @projects = @$projlist;
my $check_forks = gitweb_check_feature('forks');
- my @projects = fill_project_list_info($projlist, $check_forks);
+ my $show_ctags = gitweb_check_feature('ctags');
+ my $tagfilter = $show_ctags ? $cgi->param('by_tag') : undef;
+ $check_forks = undef
+ if ($tagfilter || $searchtext);
+
+ # filtering out forks before filling info allows to do less work
+ @projects = filter_forks_from_projects_list(\@projects)
+ if ($check_forks);
+ @projects = fill_project_list_info(\@projects);
+ # searching projects require filling to be run before it
+ @projects = search_projects_list(\@projects,
+ 'searchtext' => $searchtext,
+ 'tagfilter' => $tagfilter)
+ if ($tagfilter || $searchtext);
$order ||= $default_projects_order;
$from = 0 unless defined $from;
$to = $#projects if (!defined $to || $#projects < $to);
- my %order_info = (
- project => { key => 'path', type => 'str' },
- descr => { key => 'descr_long', type => 'str' },
- owner => { key => 'owner', type => 'str' },
- age => { key => 'age', type => 'num' }
- );
- my $oi = $order_info{$order};
- if ($oi->{'type'} eq 'str') {
- @projects = sort {$a->{$oi->{'key'}} cmp $b->{$oi->{'key'}}} @projects;
- } else {
- @projects = sort {$a->{$oi->{'key'}} <=> $b->{$oi->{'key'}}} @projects;
+ # short circuit
+ if ($from > $to) {
+ print "<center>\n".
+ "<b>No such projects found</b><br />\n".
+ "Click ".$cgi->a({-href=>href(project=>undef)},"here")." to view all projects<br />\n".
+ "</center>\n<br />\n";
+ return;
}
- my $show_ctags = gitweb_check_feature('ctags');
+ @projects = sort_projects_list(\@projects, $order);
+
if ($show_ctags) {
my %ctags;
foreach my $p (@projects) {
@@ -4852,32 +4944,26 @@ sub git_project_list_body {
"</tr>\n";
}
my $alternate = 1;
- my $tagfilter = $cgi->param('by_tag');
for (my $i = $from; $i <= $to; $i++) {
my $pr = $projects[$i];
- next if $tagfilter and $show_ctags and not grep { lc $_ eq lc $tagfilter } keys %{$pr->{'ctags'}};
- next if $searchtext and not $pr->{'path'} =~ /$searchtext/
- and not $pr->{'descr_long'} =~ /$searchtext/;
- # Weed out forks or non-matching entries of search
- if ($check_forks) {
- my $forkbase = $project; $forkbase ||= ''; $forkbase =~ s#\.git$#/#;
- $forkbase="^$forkbase" if $forkbase;
- next if not $searchtext and not $tagfilter and $show_ctags
- and $pr->{'path'} =~ m#$forkbase.*/.*#; # regexp-safe
- }
-
if ($alternate) {
print "<tr class=\"dark\">\n";
} else {
print "<tr class=\"light\">\n";
}
$alternate ^= 1;
+
if ($check_forks) {
print "<td>";
if ($pr->{'forks'}) {
- print "<!-- $pr->{'forks'} -->\n";
- print $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "+");
+ my $nforks = scalar @{$pr->{'forks'}};
+ if ($nforks > 0) {
+ print $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks"),
+ -title => "$nforks forks"}, "+");
+ } else {
+ print $cgi->span({-title => "$nforks forks"}, "+");
+ }
}
print "</td>\n";
}
@@ -5357,7 +5443,10 @@ sub git_forks {
}
sub git_project_index {
- my @projects = git_get_projects_list($project);
+ my @projects = git_get_projects_list();
+ if (!@projects) {
+ die_error(404, "No projects found");
+ }
print $cgi->header(
-type => 'text/plain',
@@ -5399,7 +5488,11 @@ sub git_summary {
my $check_forks = gitweb_check_feature('forks');
if ($check_forks) {
+ # find forks of a project
@forklist = git_get_projects_list($project);
+ # filter out forks of forks
+ @forklist = filter_forks_from_projects_list(\@forklist)
+ if (@forklist);
}
git_header_html();
@@ -7319,6 +7412,9 @@ sub git_atom {
sub git_opml {
my @list = git_get_projects_list();
+ if (!@list) {
+ die_error(404, "No projects found");
+ }
print $cgi->header(
-type => 'text/xml',
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index afac5b5..71ef0ac 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -595,4 +595,53 @@ test_expect_success HIGHLIGHT \
git commit -m "Add test.sh" &&
gitweb_run "p=.git;a=blob;f=test.sh"'
+# ----------------------------------------------------------------------
+# forks of projects
+
+cat >>gitweb_config.perl <<\EOF &&
+$feature{'forks'}{'default'} = [1];
+EOF
+
+test_expect_success \
+ 'forks: prepare' \
+ 'git init --bare foo.git &&
+ git --git-dir=foo.git --work-tree=. add file &&
+ git --git-dir=foo.git --work-tree=. commit -m "Initial commit" &&
+ echo "foo" > foo.git/description &&
+ mkdir -p foo &&
+ (cd foo &&
+ git clone --shared --bare ../foo.git foo-forked.git &&
+ echo "fork of foo" > foo-forked.git/description)'
+
+test_expect_success \
+ 'forks: projects list' \
+ 'gitweb_run'
+
+test_expect_success \
+ 'forks: forks action' \
+ 'gitweb_run "p=foo.git;a=forks"'
+
+# ----------------------------------------------------------------------
+# content tags (tag cloud)
+
+cat >>gitweb_config.perl <<-\EOF &&
+# we don't test _setting_ content tags, so any true value is good
+$feature{'ctags'}{'default'} = ['ctags_script.cgi'];
+EOF
+
+test_expect_success \
+ 'ctags: tag cloud in projects list' \
+ 'mkdir .git/ctags &&
+ echo "2" > .git/ctags/foo &&
+ echo "1" > .git/ctags/bar &&
+ gitweb_run'
+
+test_expect_success \
+ 'ctags: search projects by existing tag' \
+ 'gitweb_run "by_tag=foo"'
+
+test_expect_success \
+ 'ctags: search projects by non existent tag' \
+ 'gitweb_run "by_tag=non-existent"'
+
test_done
diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh
index dd83890..731e64c 100755
--- a/t/t9502-gitweb-standalone-parse-output.sh
+++ b/t/t9502-gitweb-standalone-parse-output.sh
@@ -112,4 +112,78 @@ test_expect_success 'snapshot: hierarchical branch name (xx/test)' '
'
test_debug 'cat gitweb.headers'
+# ----------------------------------------------------------------------
+# forks of projects
+
+test_expect_success 'forks: setup' '
+ git init --bare foo.git &&
+ echo file > file &&
+ git --git-dir=foo.git --work-tree=. add file &&
+ git --git-dir=foo.git --work-tree=. commit -m "Initial commit" &&
+ echo "foo" > foo.git/description &&
+ git clone --bare foo.git foo.bar.git &&
+ echo "foo.bar" > foo.bar.git/description &&
+ git clone --bare foo.git foo_baz.git &&
+ echo "foo_baz" > foo_baz.git/description &&
+ rm -fr foo &&
+ mkdir -p foo &&
+ (
+ cd foo &&
+ git clone --shared --bare ../foo.git foo-forked.git &&
+ echo "fork of foo" > foo-forked.git/description
+ )
+'
+
+test_expect_success 'forks: not skipped unless "forks" feature enabled' '
+ gitweb_run "a=project_list" &&
+ grep -q ">\\.git<" gitweb.body &&
+ grep -q ">foo\\.git<" gitweb.body &&
+ grep -q ">foo_baz\\.git<" gitweb.body &&
+ grep -q ">foo\\.bar\\.git<" gitweb.body &&
+ grep -q ">foo_baz\\.git<" gitweb.body &&
+ grep -q ">foo/foo-forked\\.git<" gitweb.body &&
+ grep -q ">fork of .*<" gitweb.body
+'
+
+cat >>gitweb_config.perl <<\EOF &&
+$feature{'forks'}{'default'} = [1];
+EOF
+
+test_expect_success 'forks: forks skipped if "forks" feature enabled' '
+ gitweb_run "a=project_list" &&
+ grep -q ">\\.git<" gitweb.body &&
+ grep -q ">foo\\.git<" gitweb.body &&
+ grep -q ">foo_baz\\.git<" gitweb.body &&
+ grep -q ">foo\\.bar\\.git<" gitweb.body &&
+ grep -q ">foo_baz\\.git<" gitweb.body &&
+ grep -v ">foo/foo-forked\\.git<" gitweb.body &&
+ grep -v ">fork of .*<" gitweb.body
+'
+
+test_expect_success 'forks: "forks" action for forked repository' '
+ gitweb_run "p=foo.git;a=forks" &&
+ grep -q ">foo/foo-forked\\.git<" gitweb.body &&
+ grep -q ">fork of foo<" gitweb.body
+'
+
+test_expect_success 'forks: can access forked repository' '
+ gitweb_run "p=foo/foo-forked.git;a=summary" &&
+ grep -q "200 OK" gitweb.headers &&
+ grep -q ">fork of foo<" gitweb.body
+'
+
+test_expect_success 'forks: project_index lists all projects (incl. forks)' '
+ cat >expected <<-\EOF
+ .git
+ foo.bar.git
+ foo.git
+ foo/foo-forked.git
+ foo_baz.git
+ EOF
+ gitweb_run "a=project_index" &&
+ sed -e "s/ .*//" <gitweb.body | sort >actual &&
+ test_cmp expected actual
+'
+
+
test_done
--
1.7.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/6 (v2)] gitweb: Restructure projects list generation
2011-04-29 17:51 ` [PATCH 1/6 (v2)] gitweb: Restructure projects list generation Jakub Narebski
@ 2011-05-07 18:39 ` Jakub Narebski
0 siblings, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-05-07 18:39 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey
[-- Attachment #1: Type: text/plain, Size: 6494 bytes --]
On Fri, 29 Apr 2011, Jakub Narebski wrote:
> Extract filtering out forks (which is done if 'forks' feature is
> enabled) into filter_forks_from_projects_list subroutine, and
> searching projects (via projects search form, or via content tags)
> into search_projects_list subroutine.
If it was all this patch did, it would be probably easier to review.
Should I then split this patch into smaller, though not entirely
standalone patches: pure refactoring, improving finding forks, and
adding tests?
> Sorting projects got also refactored in a very straight way (just
> moving code) into sort_projects_list subroutine.
This also could be refactored in a separate patch... though I rather
have all refactorings together, so one can see the result: clean code.
> The interaction between forks, content tags and searching is now made
> more explicit: searching whether by tag, or via search form turns off
> fork filtering (gitweb searches also forks, and will show all
> results). If 'ctags' feature is disabled, then searching by tag is
> too.
This is functional change that makes code more clear, and I think it
makes better behavior... but it changes how gitweb behaves, so it is
not pure refactoring. Should I put it into a bit artificial separate
patch?
> While at it we now detect that there are no projects and respond with
> "404 No projects found" also for 'project_index' and 'opml' actions.
This "while at it" is here because we can now say "No projects found"
for 'projects_list' action, because filtering / searching is done
upfront, so we know if we found anything.
But again, this perhaps would be better put in a separate patch.
[...]
> filter_forks_from_projects_list() uses trie[1] (prefix tree) to find
> which repositories are forks of other projects; directories are used
> as "letters" in trie. Algorithm to create trie and find prefix match
> was created with some help of code of Tree::Tree[2] Perl module from
> CPAN.
>
> [1]: http://en.wikipedia.org/wiki/Trie
> [2]: http://p3rl.org/Tree::Trie
This change in algorithm should perhaps be put in a separate patch...
> Note that before this commit filtering out forks worked only for
> $projects_list being a file, it assumed that forks are always after
> forked project (not very unreasonable), and used algorithm which is
> O(<projects>^2).
>
> Current code filters out projects only where it should, regardless
> whether $projects_list is directory or a file, and trie-based
> algorithm is O(<projects>) + O(<projects forked>*<max depth>).
> Current code also consistently finds shortest prefix (important in
> counting forks in case of forks of forks).
Jakub Narebski wrote in
"Re: What's cooking in git.git (May 2011, #02; Wed, 4)"
http://thread.gmane.org/gmane.comp.version-control.git/172789/focus=172820
> Junio C Hamano <gitster@pobox.com> writes:
[...]
>> * jn/ctags (2011-04-29) 6 commits
>> - gitweb: Optional grouping of projects by category
>> - gitweb: Modularized git_get_project_description to be more generic
>> - gitweb: Split git_project_list_body in two functions
>> - gitweb: Mark matched 'ctag' / contents tag (?by_tag=foo)
>> - gitweb: Change the way "content tags" ('ctags') are handled
>> - gitweb: Restructure projects list generation
>>
>> Waiting for comments.
>
> Should I do and post benchmarks for
>
> - gitweb: Restructure projects list generation
>
> change (when 'forks' feature is used)?
It is not easy to compare old and new code, because those algorithm
have different behavior.
Best I can come up with is comparing pure fork detection (not the time
it takes to generate page) - I can easy mock that - for sizes comparable
with http://repo.or.cz which uses 'forks' feature. repo.or.cz has
around 4,000 projects, around 300 of those have forks (note: forks are
not included in number of projects). From non-representative sample
most forked project have but single fork; note however that there are
repositories such as git.git, which has 55 forks, and some of those have
forks on its own (forks of forks).
plan: nprojects=4000; nprojects_forked=300; nforks_per_forked_project=1
number of projects: 4000
Benchmark: running old, new for at least 600 CPU seconds...
old: 1231.38 wallclock secs (620.23 usr + 1.30 sys = 621.53 CPU) @ 0.03/s (n=21)
new: 847.6 wallclock secs (632.34 usr + 0.80 sys = 633.14 CPU) @ 5.18/s (n=3277)
s/iter old new
old 29.6 -- -99%
new 0.193 15219% --
Dumbbench: running old, new for at least 50 iterations...
old: Ran 106 iterations (25 outliers).
old: Rounded run time per iteration: 3.316e+01 +/- 1.5e-01 (0.5%)
new: Ran 66 iterations (15 outliers).
new: Rounded run time per iteration: 2.0213e-01 +/- 2.1e-04 (0.1%)
Below there is table of time versus number of projects, used to generate
attached and linked images. Data was generated using Dumbbench module
from CPAN.
########################################################
# nforks = 5 (forks per project forked)
# nforked = 5.00%
# 1:projects 2:nforked 3:old 4:old_err 5:new 5:new_err
# 0 0 8.959e-06 2.2e-08 1.0137e-05 2.0e-08
50 3 3.211e-03 1.6e-05 2.0524e-03 5.0e-06
103 5 1.4003e-02 6.7e-05 4.477e-03 2.2e-05
204 10 5.647e-02 2.3e-04 9.464e-03 4.6e-05
402 20 2.1939e-01 3.2e-04 1.9943e-02 9.6e-05
500 25 3.666e-01 3.6e-03 2.706e-02 2.5e-04
800 40 9.124e-01 1.7e-03 4.175e-02 2.1e-04
1000 50 1.575e+00 1.3e-02 5.896e-02 5.3e-04
1600 80 3.738e+00 1.2e-02 9.347e-02 4.3e-04
2004 100 6.820e+00 6.7e-02 1.1216e-01 3.6e-04
3203 160 1.8189e+01 8.5e-02 1.8144e-01 4.5e-04
4003 200 2.927e+01 2.9e-01 2.2939e-01 5.7e-04
Script used to generate this data is attached.
From those data we can generate a plot to show how time depends on size
of problem (on total number of projects). Assuming that both "old" ane
"new" are P i.e. are O(n^k), where 'n' is number of projects, this
dependency is best examined on loglog scale.
y = x^k
log y = log (x^k) = k * log x
v = k * u, where u = log x, v = log y
This plot, together with fit of cubic polynomial and linear function
to the data, is shown in
https://git.wiki.kernel.org/index.php/File:Benchmark_find_forks-old_vs_new.png
(I'm sorry for abusing git wiki for sharing images...).
--
Jakub Narebski
Poland
[-- Attachment #2: benchmark_find_forks.perl --]
[-- Type: text/plain, Size: 6721 bytes --]
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw(first);
use POSIX qw(ceil);
use Acme::MetaSyntactic qw(any);
use Benchmark qw(:all :hireswallclock);
use Dumbbench;
use Data::Dumper::Concise; # for debugging
# ----------------------------------------------------------------------
# old: version used to be in git_get_projects_list for $projects_list file
sub filter__git_get_projects_list {
my ($projects, $need_sorting) = @_;
@$projects = sort { $a->{'path'} cmp $b->{'path'} } @$projects
if ($need_sorting);
my (@filtered, %paths);
PROJECT:
foreach my $pr (@$projects) {
my $path = $pr->{'path'};
# below is v1.7.5.1:gitweb/gitweb.perl lines 2725 and later
PATH:
foreach my $filter (keys %paths) {
# looking for forks
my $pfx = substr($path, 0, length($filter));
if ($pfx ne $filter) {
next PATH;
}
my $sfx = substr($path, length($filter));
if ($sfx !~ /^\/.*\.git$/) {
next PATH;
}
# is a fork, don't include it in
# the list
next PROJECT;
}
# [...]
# below is v1.7.5.1:gitweb/gitweb.perl line 2746 and later
push @filtered, $pr;
(my $forks_path = $path) =~ s/\.git$//;
$paths{$forks_path}++;
}
return @filtered;
}
# new: using 'path' only, using trie; code borrowed from Tree::Trie (Perl Artistic License)
sub filter_forks_from_projects_list {
my ($projects, $need_sorting) = @_;
@$projects = sort { $a->{'path'} cmp $b->{'path'} } @$projects
if ($need_sorting);
my %trie; # prefix tree of directories (path components)
# generate trie out of those directories that might contain forks
foreach my $pr (@$projects) {
my $path = $pr->{'path'};
$path =~ s/\.git$//; # forks of 'repo.git' are in 'repo/' directory
next if ($path =~ m!/$!); # skip non-bare repositories, e.g. 'repo/.git'
next unless ($path); # skip '.git' repository: tests, git-instaweb
#next unless (-d $path); # containing directory exists (NOT IN TESTS!!!)
$pr->{'forks'} = []; # there can be 0 or more forks of project
# add to trie
my @dirs = split('/', $path);
# walk the trie, until either runs out of components or out of trie
my $ref = \%trie;
while (scalar @dirs &&
exists($ref->{$dirs[0]})) {
$ref = $ref->{shift @dirs};
}
# create rest of trie structure from rest of components
foreach my $dir (@dirs) {
$ref = $ref->{$dir} = {};
}
# create end marker, store $pr as a data
$ref->{''} = $pr if (!exists $ref->{''});
}
# filter out forks, by finding shortest prefix match for paths
my @filtered;
PROJECT:
foreach my $pr (@$projects) {
# trie lookup
my $ref = \%trie;
DIR:
foreach my $dir (split('/', $pr->{'path'})) {
if (exists $ref->{''}) {
# found [shortest] prefix, is a fork - skip it
push @{$ref->{''}{'forks'}}, $pr;
next PROJECT;
}
if (!exists $ref->{$dir}) {
# not in trie, cannot have prefix, not a fork
push @filtered, $pr;
next PROJECT;
}
# If the dir is there, we just walk one step down the trie.
$ref = $ref->{$dir};
}
# we ran out of trie
# (shouldn't happen: it's either no match, or end marker)
push @filtered, $pr;
}
return @filtered;
}
# ----------------------------------------------------------------------
sub prepare_projects {
my ($nprojects, $nforked, $nforks_per_project, $exclude_forks) = @_;
my @projects;
while ($nprojects > 0) {
push @projects, metaname().".git";
$nprojects--;
last unless $nprojects;
if (rand() < (1.0*$nforked/$nprojects)) {
(my $base = $projects[-1]) =~ s/\.git$//;
push @projects, map { "$base/$_.git" } metaname($nforks_per_project);
$nprojects -= $nforks_per_project unless $exclude_forks;
$nforked--;
}
}
#print Dumper(\@projects);
@projects = map { { 'path' => $_ } } @projects;
return @projects;
}
# ######################################################################
my (@projects, @filtered);
# repo.or.cz
# * 4038 projects (without forks?), based on http://repo.or.cz/w?a=project_index
# * 4024 projects, based on "grep -c -e '<tr class="'" on http://repo.or.cz/w?a=project_list
# * 3428 active projects, based on "grep -c -e '"age'" on http://repo.or.cz/w?a=project_list
# * 308 projects have forks (7.6-10%), based on grep -c -e '/forks">+</a>' project_list.html
# * nonrepresentative sample of nforks: 6, 2, 1, 1, 0, 0, 1, 0, 1, 2, 3, 3, 3, 55+, 1, 1, 1
my $nprojects = 100;
my $nforked_frac = 0.05;
my $nforked = ceil($nforked_frac*$nprojects); # 5%
my $nforks = 5;
$nprojects = 4000;
$nforked = 300;
$nforks = 1;
print "plan: nprojects=$nprojects; nforked=$nforked; nforks=$nforks\n";
@projects = prepare_projects($nprojects, $nforked, $nforks);
print "number of projects: ".(scalar @projects)."\n";
print "\n";
my $results = timethese(-600, {
'old' => sub { filter__git_get_projects_list(\@projects) },
'new' => sub { filter_forks_from_projects_list(\@projects) },
});
#print Dumper($results);
cmpthese($results);
print "\n";
print "Dumbbench: old, new\n";
my $bench = Dumbbench->new(
target_rel_precision => 0.005, # seek ~0.5%
initial_runs => 50, # the higher the more reliable
);
$bench->add_instances(
Dumbbench::Instance::PerlSub->new(
name => 'old',
code => sub { filter__git_get_projects_list(\@projects) },
),
Dumbbench::Instance::PerlSub->new(
name => 'new',
code => sub { filter_forks_from_projects_list(\@projects) },
),
);
$bench->run();
$bench->report();
print "\n";
#print Dumper($bench->instances());
__END__
print "#####################################################################\n".
"# nforks = $nforks (forks per project forked)\n".
"# nforked = ".(sprintf "%.2f%%", $nforked_frac*100.0)."\n".
"# 1:projects 2:nforked 3:old 4:old_err 5:new 5:new_err\n";
for (my $i = 50; $i <= 4000; $i *= 2) {
$nforked = ceil($nforked_frac*$i);
@projects = prepare_projects($i, $nforked, $nforks);
$bench = Dumbbench->new(
target_rel_precision => 0.005, # seek ~0.5%
initial_runs => 50, # the higher the more reliable
);
$bench->add_instances(
Dumbbench::Instance::PerlSub->new(
name => 'old',
code => sub { filter__git_get_projects_list(\@projects) },
),
Dumbbench::Instance::PerlSub->new(
name => 'new',
code => sub { filter_forks_from_projects_list(\@projects) },
),
);
$bench->run();
my %instances = map { $_->name() => $_ } $bench->instances();
printf "%4d %3d ", (scalar @projects), $nforked;
foreach my $name (qw(old new)) {
my $result = $instances{$name}->result();
print " ".$result->number();
print " ".$result->error()->[0];
}
print "\n";
}
#print Dumper($bench->instances());
#print Dumper($_->result()) foreach ($bench->instances());
print "\n";
1;
__END__
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/6] gitweb: Change the way "content tags" ('ctags') are handled
2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
2011-04-29 17:51 ` [PATCH 1/6 (v2)] gitweb: Restructure projects list generation Jakub Narebski
@ 2011-04-29 17:51 ` Jakub Narebski
2011-04-29 17:51 ` [PATCH 3/6] gitweb: Mark matched 'ctag' / contents tag (?by_tag=foo) Jakub Narebski
` (4 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 17:51 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey,
Jakub Narebski
The major change is removing the ability to edit content tags (ctags)
in a web browser.
The interface was created by gitweb, while actual editing of tags was
to be done by external script; the API was not defined, and neither
was provided example implementation. Such split is also a bit fragile
- interface and implementation have to be kept in sync. Gitweb
provided only ability to add tags; you could not edit tags nor delete
them.
Format of ctags is now described in the comment above git_get_project_ctags
subroutine. Gitweb now is more robust with respect to original ctags
format; it also accepts two new formats: $GIT_DIR/ctags file, with one
content tag per line, and multi-value `gitweb.ctag' config variable.
Gathering all ctags of all project is now put in git_gather_all_ctags
subroutine, making git_project_list_body more clear.
git_populate_project_tagcloud subroutine now generates data used for
tag cloud, including generation of ctag link, also in the case
HTML::TagCloud module is unavailable. Links are now generated using
href() subroutine - this is more robust, as ctags might contain '?',
';' and '=' special characters that need to be escaped in query param.
Shown tags are HTML-escaped.
The generation of tag cloud in git_show_project_tagcloud in the case
when HTML::TagCloud is not available is now changed slightly.
The 'content tags' field on project summary page is made more in line
with other fields in "projects_list" table. Because one cannot now
add new tags from web interface, this field is no longer displayed
when there are no content tags for given project.
Ctags-issue-Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Ctags-issue-Reported-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch was originally sent to git mailing list in
[RFC/PATCH] gitweb: Change the way "content tags" ('ctags') are handled
http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/802865/focus=168599
as part of
gitweb: cloud tags feature produces malformed XML for errors
http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/802865/focus=168266
thread, crossposted to git mailing list.
Hardening parsing of ctags files, so that gitweb does not crash on
malformed entries, but just ignores them, sort of fixes original issue
reported in
Bug#616005: libhtml-tagcloud-perl breaks gitweb
http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/802865/
http://bugs.debian.org/616005
by Uwe Kleine-König and crossposted to git mailing list by Jonathan
Nieder.
I say "sort of fixes" because this patch doesn't attempt to solve
deeper fundamental problem with error / exception handling in gitweb
after some data was already sent to a browser.
Note also that the bug was caused by mishandling (incorrectly
generating) underdocumented then 'ctags' feature.
gitweb/gitweb.perl | 141 +++++++++++++++++++++++++++++++++++----------------
1 files changed, 97 insertions(+), 44 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 014b33b..60cb772 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -412,20 +412,23 @@ our %feature = (
'override' => 0,
'default' => []},
- # Allow gitweb scan project content tags described in ctags/
- # of project repository, and display the popular Web 2.0-ish
- # "tag cloud" near the project list. Note that this is something
- # COMPLETELY different from the normal Git tags.
+ # Allow gitweb scan project content tags of project repository,
+ # and display the popular Web 2.0-ish "tag cloud" near the projects
+ # list. Note that this is something COMPLETELY different from the
+ # normal Git tags.
# gitweb by itself can show existing tags, but it does not handle
- # tagging itself; you need an external application for that.
- # For an example script, check Girocco's cgi/tagproj.cgi.
+ # tagging itself; you need to do it externally, outside gitweb.
+ # The format is described in git_get_project_ctags() subroutine.
# You may want to install the HTML::TagCloud Perl module to get
# a pretty tag cloud instead of just a list of tags.
# To enable system wide have in $GITWEB_CONFIG
- # $feature{'ctags'}{'default'} = ['path_to_tag_script'];
+ # $feature{'ctags'}{'default'} = [1];
# Project specific override is not supported.
+
+ # In the future whether ctags editing is enabled might depend
+ # on the value, but using 1 should always mean no editing of ctags.
'ctags' => {
'override' => 0,
'default' => [0]},
@@ -703,6 +706,7 @@ our @cgi_param_mapping = (
snapshot_format => "sf",
extra_options => "opt",
search_use_regexp => "sr",
+ ctag => "by_tag",
# this must be last entry (for manipulation from JavaScript)
javascript => "js"
);
@@ -2572,23 +2576,66 @@ sub git_get_project_description {
return $descr;
}
+# supported formats:
+# * $GIT_DIR/ctags/<tagname> file (in 'ctags' subdirectory)
+# - if its contents is a number, use it as tag weight,
+# - otherwise add a tag with weight 1
+# * $GIT_DIR/ctags file, each line is a tag (with weight 1)
+# the same value multiple times increases tag weight
+# * `gitweb.ctag' multi-valued repo config variable
sub git_get_project_ctags {
- my $path = shift;
+ my $project = shift;
my $ctags = {};
- $git_dir = "$projectroot/$path";
- opendir my $dh, "$git_dir/ctags"
- or return $ctags;
- foreach (grep { -f $_ } map { "$git_dir/ctags/$_" } readdir($dh)) {
- open my $ct, '<', $_ or next;
- my $val = <$ct>;
- chomp $val;
- close $ct;
- my $ctag = $_; $ctag =~ s#.*/##;
- $ctags->{$ctag} = $val;
+ $git_dir = "$projectroot/$project";
+ if (opendir my $dh, "$git_dir/ctags") {
+ my @files = grep { -f $_ } map { "$git_dir/ctags/$_" } readdir($dh);
+ foreach my $tagfile (@files) {
+ open my $ct, '<', $tagfile
+ or next;
+ my $val = <$ct>;
+ chomp $val if $val;
+ close $ct;
+
+ (my $ctag = $tagfile) =~ s#.*/##;
+ if ($val =~ /\d+/) {
+ $ctags->{$ctag} = $val;
+ } else {
+ $ctags->{$ctag} = 1;
+ }
+ }
+ closedir $dh;
+
+ } elsif (open my $fh, '<', "$git_dir/ctags") {
+ while (my $line = <$fh>) {
+ chomp $line;
+ $ctags->{$line}++ if $line;
+ }
+ close $fh;
+
+ } else {
+ my $taglist = config_to_multi(git_get_project_config('ctag'));
+ foreach my $tag (@$taglist) {
+ $ctags->{$tag}++;
+ }
}
- closedir $dh;
- $ctags;
+
+ return $ctags;
+}
+
+# return hash, where keys are content tags ('ctags'),
+# and values are sum of weights of given tag in every project
+sub git_gather_all_ctags {
+ my $projects = shift;
+ my $ctags = {};
+
+ foreach my $p (@$projects) {
+ foreach my $ct (keys %{$p->{'ctags'}}) {
+ $ctags->{$ct} += $p->{'ctags'}->{$ct};
+ }
+ }
+
+ return $ctags;
}
sub git_populate_project_tagcloud {
@@ -2608,31 +2655,41 @@ sub git_populate_project_tagcloud {
my $cloud;
if (eval { require HTML::TagCloud; 1; }) {
$cloud = HTML::TagCloud->new;
- foreach (sort keys %ctags_lc) {
+ foreach my $ctag (sort keys %ctags_lc) {
# Pad the title with spaces so that the cloud looks
# less crammed.
- my $title = $ctags_lc{$_}->{topname};
+ my $title = esc_html($ctags_lc{$ctag}->{topname});
$title =~ s/ / /g;
$title =~ s/^/ /g;
$title =~ s/$/ /g;
- $cloud->add($title, $home_link."?by_tag=".$_, $ctags_lc{$_}->{count});
+ $cloud->add($title, href(project=>undef, ctag=>$ctag),
+ $ctags_lc{$ctag}->{count});
}
} else {
- $cloud = \%ctags_lc;
+ $cloud = {};
+ foreach my $ctag (keys %ctags_lc) {
+ my $title = $ctags_lc{$ctag}->{topname};
+ $cloud->{$ctag}{count} = $ctags_lc{$ctag}->{count};
+ $cloud->{$ctag}{ctag} =
+ $cgi->a({-href=>href(project=>undef, ctag=>$ctag)},
+ esc_html($title, -nbsp=>1));
+ }
}
- $cloud;
+ return $cloud;
}
sub git_show_project_tagcloud {
my ($cloud, $count) = @_;
- print STDERR ref($cloud)."..\n";
if (ref $cloud eq 'HTML::TagCloud') {
return $cloud->html_and_css($count);
} else {
- my @tags = sort { $cloud->{$a}->{count} <=> $cloud->{$b}->{count} } keys %$cloud;
- return '<p align="center">' . join (', ', map {
- $cgi->a({-href=>"$home_link?by_tag=$_"}, $cloud->{$_}->{topname})
- } splice(@tags, 0, $count)) . '</p>';
+ my @tags = sort { $cloud->{$a}->{'count'} <=> $cloud->{$b}->{'count'} } keys %$cloud;
+ return
+ '<div id="htmltagcloud"'.($project ? '' : ' align="center"').'>' .
+ join (', ', map {
+ $cloud->{$_}->{'ctag'}
+ } splice(@tags, 0, $count)) .
+ '</div>';
}
}
@@ -4920,13 +4977,8 @@ sub git_project_list_body {
@projects = sort_projects_list(\@projects, $order);
if ($show_ctags) {
- my %ctags;
- foreach my $p (@projects) {
- foreach my $ct (keys %{$p->{'ctags'}}) {
- $ctags{$ct} += $p->{'ctags'}->{$ct};
- }
- }
- my $cloud = git_populate_project_tagcloud(\%ctags);
+ my $ctags = git_gather_all_ctags(\@projects);
+ my $cloud = git_populate_project_tagcloud($ctags);
print git_show_project_tagcloud($cloud, 64);
}
@@ -5521,13 +5573,14 @@ sub git_summary {
my $show_ctags = gitweb_check_feature('ctags');
if ($show_ctags) {
my $ctags = git_get_project_ctags($project);
- my $cloud = git_populate_project_tagcloud($ctags);
- print "<tr id=\"metadata_ctags\"><td>Content tags:<br />";
- print "</td>\n<td>" unless %$ctags;
- print "<form action=\"$show_ctags\" method=\"post\"><input type=\"hidden\" name=\"p\" value=\"$project\" />Add: <input type=\"text\" name=\"t\" size=\"8\" /></form>";
- print "</td>\n<td>" if %$ctags;
- print git_show_project_tagcloud($cloud, 48);
- print "</td></tr>";
+ if (%$ctags) {
+ # without ability to add tags, don't show if there are none
+ my $cloud = git_populate_project_tagcloud($ctags);
+ print "<tr id=\"metadata_ctags\">" .
+ "<td>content tags</td>" .
+ "<td>".git_show_project_tagcloud($cloud, 48)."</td>" .
+ "</tr>\n";
+ }
}
print "</table>\n";
--
1.7.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/6] gitweb: Mark matched 'ctag' / contents tag (?by_tag=foo)
2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
2011-04-29 17:51 ` [PATCH 1/6 (v2)] gitweb: Restructure projects list generation Jakub Narebski
2011-04-29 17:51 ` [PATCH 2/6] gitweb: Change the way "content tags" ('ctags') are handled Jakub Narebski
@ 2011-04-29 17:51 ` Jakub Narebski
2011-04-29 17:51 ` [PATCH 4/6] gitweb: Split git_project_list_body in two functions Jakub Narebski
` (3 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 17:51 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey,
Jakub Narebski
It might have been hard to discover that current view is limited to
projects with given content tag (ctag), as it was distinquished only
in gitweb URL. Mark matched contents tag in the tag cloud using
"match" class, for easier discovery.
This commit introduces a bit of further code duplication in
git_populate_project_tagcloud().
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Acked-by: Petr Baudis <pasky@suse.cz>
---
This patch was originally sent to git mailing list as
[PATCH 2/1] gitweb: Mark matched 'ctag' / contents tag (?by_tag=foo)
http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/802865/focus=168731
(yes, it was crossposted).
gitweb/gitweb.perl | 12 +++++++++---
1 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 60cb772..e81c27d 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2653,6 +2653,7 @@ sub git_populate_project_tagcloud {
}
my $cloud;
+ my $matched = $cgi->param('by_tag');
if (eval { require HTML::TagCloud; 1; }) {
$cloud = HTML::TagCloud->new;
foreach my $ctag (sort keys %ctags_lc) {
@@ -2662,17 +2663,22 @@ sub git_populate_project_tagcloud {
$title =~ s/ / /g;
$title =~ s/^/ /g;
$title =~ s/$/ /g;
+ if (defined $matched && $matched eq $ctag) {
+ $title = qq(<span class="match">$title</span>);
+ }
$cloud->add($title, href(project=>undef, ctag=>$ctag),
$ctags_lc{$ctag}->{count});
}
} else {
$cloud = {};
foreach my $ctag (keys %ctags_lc) {
- my $title = $ctags_lc{$ctag}->{topname};
+ my $title = esc_html($ctags_lc{$ctag}->{topname}, -nbsp=>1);
+ if (defined $matched && $matched eq $ctag) {
+ $title = qq(<span class="match">$title</span>);
+ }
$cloud->{$ctag}{count} = $ctags_lc{$ctag}->{count};
$cloud->{$ctag}{ctag} =
- $cgi->a({-href=>href(project=>undef, ctag=>$ctag)},
- esc_html($title, -nbsp=>1));
+ $cgi->a({-href=>href(project=>undef, ctag=>$ctag)}, $title);
}
}
return $cloud;
--
1.7.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/6] gitweb: Split git_project_list_body in two functions
2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
` (2 preceding siblings ...)
2011-04-29 17:51 ` [PATCH 3/6] gitweb: Mark matched 'ctag' / contents tag (?by_tag=foo) Jakub Narebski
@ 2011-04-29 17:51 ` Jakub Narebski
2011-04-29 17:52 ` [PATCH 5/6] gitweb: Modularized git_get_project_description to be more generic Jakub Narebski
` (2 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 17:51 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey,
Jakub Narebski
From: Sebastien Cevey <seb@cine7.net>
Extract the printing of project rows (body/contents of projects list
table) on the 'project_list' page into a separate git_project_list_rows
function. This makes it easier to reuse the code to print different
subsets of the whole project list.
[jn: Updated to post restructuring projects list generation]
Signed-off-by: Sebastien Cevey <seb@cine7.net>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch and subsequent patches were sent to git mailing list
(before rebasing on top of 'ctags' improvements) as
[RFC/PATCH 0/3] gitweb: Categories support
http://thread.gmane.org/gmane.comp.version-control.git/168616
This subseries is long in wait port of Sebastien Cevey series from
December 2008
http://thread.gmane.org/gmane.comp.version-control.git/102844
to modern gitweb. It was waiting for restructuring of projects list
generation that is part of this series.
Uwe, it might be an alternative to use of ctags (content tags, aka
project labels).
gitweb/gitweb.perl | 89 +++++++++++++++++++++++++++++-----------------------
1 files changed, 50 insertions(+), 39 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e81c27d..f8d5722 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -4946,6 +4946,55 @@ sub format_sort_th {
return $sort_th;
}
+sub git_project_list_rows {
+ my ($projlist, $from, $to, $check_forks) = @_;
+
+ $from = 0 unless defined $from;
+ $to = $#$projlist if (!defined $to || $#$projlist < $to);
+
+ my $alternate = 1;
+ for (my $i = $from; $i <= $to; $i++) {
+ my $pr = $projlist->[$i];
+
+ if ($alternate) {
+ print "<tr class=\"dark\">\n";
+ } else {
+ print "<tr class=\"light\">\n";
+ }
+ $alternate ^= 1;
+
+ if ($check_forks) {
+ print "<td>";
+ if ($pr->{'forks'}) {
+ my $nforks = scalar @{$pr->{'forks'}};
+ if ($nforks > 0) {
+ print $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks"),
+ -title => "$nforks forks"}, "+");
+ } else {
+ print $cgi->span({-title => "$nforks forks"}, "+");
+ }
+ }
+ print "</td>\n";
+ }
+ print "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
+ -class => "list"}, esc_html($pr->{'path'})) . "</td>\n" .
+ "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
+ -class => "list", -title => $pr->{'descr_long'}},
+ esc_html($pr->{'descr'})) . "</td>\n" .
+ "<td><i>" . chop_and_escape_str($pr->{'owner'}, 15) . "</i></td>\n";
+ print "<td class=\"". age_class($pr->{'age'}) . "\">" .
+ (defined $pr->{'age_string'} ? $pr->{'age_string'} : "No commits") . "</td>\n" .
+ "<td class=\"link\">" .
+ $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary")}, "summary") . " | " .
+ $cgi->a({-href => href(project=>$pr->{'path'}, action=>"shortlog")}, "shortlog") . " | " .
+ $cgi->a({-href => href(project=>$pr->{'path'}, action=>"log")}, "log") . " | " .
+ $cgi->a({-href => href(project=>$pr->{'path'}, action=>"tree")}, "tree") .
+ ($pr->{'forks'} ? " | " . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "forks") : '') .
+ "</td>\n" .
+ "</tr>\n";
+ }
+}
+
sub git_project_list_body {
# actually uses global variable $project
my ($projlist, $order, $from, $to, $extra, $no_header) = @_;
@@ -5001,47 +5050,9 @@ sub git_project_list_body {
print "<th></th>\n" . # for links
"</tr>\n";
}
- my $alternate = 1;
- for (my $i = $from; $i <= $to; $i++) {
- my $pr = $projects[$i];
- if ($alternate) {
- print "<tr class=\"dark\">\n";
- } else {
- print "<tr class=\"light\">\n";
- }
- $alternate ^= 1;
+ git_project_list_rows(\@projects, $from, $to, $check_forks);
- if ($check_forks) {
- print "<td>";
- if ($pr->{'forks'}) {
- my $nforks = scalar @{$pr->{'forks'}};
- if ($nforks > 0) {
- print $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks"),
- -title => "$nforks forks"}, "+");
- } else {
- print $cgi->span({-title => "$nforks forks"}, "+");
- }
- }
- print "</td>\n";
- }
- print "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
- -class => "list"}, esc_html($pr->{'path'})) . "</td>\n" .
- "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
- -class => "list", -title => $pr->{'descr_long'}},
- esc_html($pr->{'descr'})) . "</td>\n" .
- "<td><i>" . chop_and_escape_str($pr->{'owner'}, 15) . "</i></td>\n";
- print "<td class=\"". age_class($pr->{'age'}) . "\">" .
- (defined $pr->{'age_string'} ? $pr->{'age_string'} : "No commits") . "</td>\n" .
- "<td class=\"link\">" .
- $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary")}, "summary") . " | " .
- $cgi->a({-href => href(project=>$pr->{'path'}, action=>"shortlog")}, "shortlog") . " | " .
- $cgi->a({-href => href(project=>$pr->{'path'}, action=>"log")}, "log") . " | " .
- $cgi->a({-href => href(project=>$pr->{'path'}, action=>"tree")}, "tree") .
- ($pr->{'forks'} ? " | " . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "forks") : '') .
- "</td>\n" .
- "</tr>\n";
- }
if (defined $extra) {
print "<tr>\n";
if ($check_forks) {
--
1.7.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 5/6] gitweb: Modularized git_get_project_description to be more generic
2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
` (3 preceding siblings ...)
2011-04-29 17:51 ` [PATCH 4/6] gitweb: Split git_project_list_body in two functions Jakub Narebski
@ 2011-04-29 17:52 ` Jakub Narebski
2011-04-29 17:52 ` [PATCH 6/6] gitweb: Optional grouping of projects by category Jakub Narebski
2011-04-29 21:31 ` [PATCH 0/6] gitweb: Improve ctags, introduce categories Junio C Hamano
6 siblings, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 17:52 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey,
Jakub Narebski
From: Sebastien Cevey <seb@cine7.net>
Introduce a git_get_file_or_project_config utility function to
retrieve a repository variable either from a plain text file in the
$GIT_DIR or else from 'gitweb.$variable' in the repository config
(e.g. 'description').
This would be used in next commit to retrieve category for a project,
which is to be stored in the same way as project description.
Signed-off-by: Sebastien Cevey <seb@cine7.net>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
gitweb/gitweb.perl | 24 ++++++++++++++++--------
1 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index f8d5722..e8685ac 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2562,18 +2562,26 @@ sub git_get_path_by_hash {
## ......................................................................
## git utility functions, directly accessing git repository
-sub git_get_project_description {
- my $path = shift;
+# get the value of config variable either from file named as the variable
+# itself in the repository ($GIT_DIR/$name file), or from gitweb.$name
+# configuration variable in the repository config file.
+sub git_get_file_or_project_config {
+ my ($path, $name) = @_;
$git_dir = "$projectroot/$path";
- open my $fd, '<', "$git_dir/description"
- or return git_get_project_config('description');
- my $descr = <$fd>;
+ open my $fd, '<', "$git_dir/$name"
+ or return git_get_project_config($name);
+ my $conf = <$fd>;
close $fd;
- if (defined $descr) {
- chomp $descr;
+ if (defined $conf) {
+ chomp $conf;
}
- return $descr;
+ return $conf;
+}
+
+sub git_get_project_description {
+ my $path = shift;
+ return git_get_file_or_project_config($path, 'description');
}
# supported formats:
--
1.7.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 6/6] gitweb: Optional grouping of projects by category
2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
` (4 preceding siblings ...)
2011-04-29 17:52 ` [PATCH 5/6] gitweb: Modularized git_get_project_description to be more generic Jakub Narebski
@ 2011-04-29 17:52 ` Jakub Narebski
2011-04-29 21:31 ` [PATCH 0/6] gitweb: Improve ctags, introduce categories Junio C Hamano
6 siblings, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 17:52 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Uwe Kleine-Koenig, Jonathan Nieder, Petr Baudis, Sebastien Cevey,
Jakub Narebski
From: Sebastien Cevey <seb@cine7.net>
This adds the $projects_list_group_categories option which, if enabled,
will result in grouping projects by category on the project list page.
The category is specified for each project by the $GIT_DIR/category file
or the 'gitweb.category' variable in its configuration file. By default,
projects are put in the $project_list_default_category category.
Note:
- Categories are always sorted alphabetically, with projects in
each category sorted according to the globally selected $order.
- When displaying a subset of all the projects (page limiting), the
category headers are only displayed for projects present on the page.
The feature is inspired from Sham Chukoury's patch for the XMMS2
gitweb, but has been rewritten for the current gitweb code. The CSS
for categories is inspired from Gustavo Sverzut Barbieri's patch to
group projects by path.
Thanks to Florian Ragwitz for Perl tips.
[jn: Updated to post restructuring projects list generation, fixed bugs,
added very basic test in t9500 that there are no warnings from Perl.]
Signed-off-by: Sebastien Cevey <seb@cine7.net>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
gitweb/README | 16 ++++++++
gitweb/gitweb.perl | 62 ++++++++++++++++++++++++++++++--
gitweb/static/gitweb.css | 7 ++++
t/t9500-gitweb-standalone-no-errors.sh | 8 ++++
4 files changed, 90 insertions(+), 3 deletions(-)
diff --git a/gitweb/README b/gitweb/README
index a92bde7..a3a697b 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -207,6 +207,15 @@ not include variables usually directly set during build):
full description is available as 'title' attribute (usually shown on
mouseover). By default set to 25, which might be too small if you
use long project descriptions.
+ * $projects_list_group_categories
+ Enables the grouping of projects by category on the project list page.
+ The category of a project is determined by the $GIT_DIR/category
+ file or the 'gitweb.category' variable in its repository configuration.
+ Disabled by default.
+ * $project_list_default_category
+ Default category for projects for which none is specified. If set
+ to the empty string, such projects will remain uncategorized and
+ listed at the top, above categorized projects.
* @git_base_url_list
List of git base URLs used for URL to where fetch project from, shown
in project summary page. Full URL is "$git_base_url/$project".
@@ -314,6 +323,13 @@ You can use the following files in repository:
from the template during repository creation. You can use the
gitweb.description repo configuration variable, but the file takes
precedence.
+ * category (or gitweb.category)
+ Singe line category of a project, used to group projects if
+ $projects_list_group_categories is enabled. By default (file and
+ configuration variable absent), uncategorized projects are put in
+ the $project_list_default_category category. You can use the
+ gitweb.category repo configuration variable, but the file takes
+ precedence.
* cloneurl (or multiple-valued gitweb.url)
File with repository URL (used for clone and fetch), one per line.
Displayed in the project summary page. You can use multiple-valued
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e8685ac..f78fdd7 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -115,6 +115,14 @@ our $projects_list = "++GITWEB_LIST++";
# the width (in characters) of the projects list "Description" column
our $projects_list_description_width = 25;
+# group projects by category on the projects list
+# (enabled if this variable evaluates to true)
+our $projects_list_group_categories = 0;
+
+# default category if none specified
+# (leave the empty string for no category)
+our $project_list_default_category = "";
+
# default order of projects list
# valid values are none, project, descr, owner, and age
our $default_projects_order = "project";
@@ -2584,6 +2592,12 @@ sub git_get_project_description {
return git_get_file_or_project_config($path, 'description');
}
+sub git_get_project_category {
+ my $path = shift;
+ return git_get_file_or_project_config($path, 'category');
+}
+
+
# supported formats:
# * $GIT_DIR/ctags/<tagname> file (in 'ctags' subdirectory)
# - if its contents is a number, use it as tag weight,
@@ -4877,8 +4891,9 @@ sub git_patchset_body {
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
-# fills project list info (age, description, owner, forks) for each
-# project in the list, removing invalid projects from returned list
+# fills project list info (age, description, owner, category, forks)
+# for each project in the list, removing invalid projects from
+# returned list
# NOTE: modifies $projlist, but does not remove entries from it
sub fill_project_list_info {
my $projlist = shift;
@@ -4904,6 +4919,12 @@ sub fill_project_list_info {
if ($show_ctags) {
$pr->{'ctags'} = git_get_project_ctags($pr->{'path'});
}
+ if ($projects_list_group_categories && !defined $pr->{'category'}) {
+ my $cat = git_get_project_category($pr->{'path'}) ||
+ $project_list_default_category;
+ $pr->{'category'} = to_utf8($cat);
+ }
+
push @projects, $pr;
}
@@ -4931,6 +4952,23 @@ sub sort_projects_list {
return @projects;
}
+# returns a hash of categories, containing the list of project
+# belonging to each category
+sub build_projlist_by_category {
+ my ($projlist, $from, $to) = @_;
+ my %categories;
+
+ $from = 0 unless defined $from;
+ $to = $#$projlist if (!defined $to || $#$projlist < $to);
+
+ for (my $i = $from; $i <= $to; $i++) {
+ my $pr = $projlist->[$i];
+ push @{$categories{ $pr->{'category'} }}, $pr;
+ }
+
+ return wantarray ? %categories : \%categories;
+}
+
# print 'sort by' <th> element, generating 'sort by $name' replay link
# if that order is not selected
sub print_sort_th {
@@ -5059,7 +5097,25 @@ sub git_project_list_body {
"</tr>\n";
}
- git_project_list_rows(\@projects, $from, $to, $check_forks);
+ if ($projects_list_group_categories) {
+ # only display categories with projects in the $from-$to window
+ @projects = sort {$a->{'category'} cmp $b->{'category'}} @projects[$from..$to];
+ my %categories = build_projlist_by_category(\@projects, $from, $to);
+ foreach my $cat (sort keys %categories) {
+ unless ($cat eq "") {
+ print "<tr>\n";
+ if ($check_forks) {
+ print "<td></td>\n";
+ }
+ print "<td class=\"category\" colspan=\"5\">".esc_html($cat)."</td>\n";
+ print "</tr>\n";
+ }
+
+ git_project_list_rows($categories{$cat}, undef, undef, $check_forks);
+ }
+ } else {
+ git_project_list_rows(\@projects, $from, $to, $check_forks);
+ }
if (defined $extra) {
print "<tr>\n";
diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css
index 79d7eeb..4df2d16 100644
--- a/gitweb/static/gitweb.css
+++ b/gitweb/static/gitweb.css
@@ -295,6 +295,13 @@ td.current_head {
text-decoration: underline;
}
+td.category {
+ background-color: #d9d8d1;
+ border-top: 1px solid #000000;
+ border-left: 1px solid #000000;
+ font-weight: bold;
+}
+
table.diff_tree span.file_status.new {
color: #008000;
}
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 71ef0ac..f5648a6 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -644,4 +644,12 @@ test_expect_success \
'ctags: search projects by non existent tag' \
'gitweb_run "by_tag=non-existent"'
+# ----------------------------------------------------------------------
+# categories
+
+test_expect_success \
+ 'categories: projects list, only default category' \
+ 'echo "\$projects_list_group_categories = 1;" >>gitweb_config.perl &&
+ gitweb_run'
+
test_done
--
1.7.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 0/6] gitweb: Improve ctags, introduce categories
2011-04-29 17:51 [PATCH 0/6] gitweb: Improve ctags, introduce categories Jakub Narebski
` (5 preceding siblings ...)
2011-04-29 17:52 ` [PATCH 6/6] gitweb: Optional grouping of projects by category Jakub Narebski
@ 2011-04-29 21:31 ` Junio C Hamano
2011-04-29 23:53 ` Jakub Narebski
2011-04-30 20:36 ` Øyvind A. Holm
6 siblings, 2 replies; 13+ messages in thread
From: Junio C Hamano @ 2011-04-29 21:31 UTC (permalink / raw)
To: Jakub Narebski
Cc: git, John 'Warthog9' Hawley,
John 'Warthog9' Hawley, Uwe Kleine-Koenig,
Jonathan Nieder, Petr Baudis, Sebastien Cevey
A tangent. It is curious why [PATCH 2/6] alone ended up with an encoded
"Subject" header, like this:
Subject: =?UTF-8?q?=5BPATCH=202/6=5D=20gitweb=3A=20Change=20the=20
way=20=22content=20tags=22=20=28=27ctags=27=29=20are=20handled?=
The message actually has the above as a long single line, as can be seen
at http://article.gmane.org/gmane.comp.version-control.git/172479/raw
Just being curious.
The headers suggest that sending MUA was git-send-email speaking to gmail
SMTP. Did we introduce bugs to send-email recently?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/6] gitweb: Improve ctags, introduce categories
2011-04-29 21:31 ` [PATCH 0/6] gitweb: Improve ctags, introduce categories Junio C Hamano
@ 2011-04-29 23:53 ` Jakub Narebski
2011-04-30 20:36 ` Øyvind A. Holm
1 sibling, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-04-29 23:53 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, John 'Warthog9' Hawley,
John 'Warthog9' Hawley, Uwe Kleine-Koenig,
Jonathan Nieder, Petr Baudis, Sebastien Cevey
Junio C Hamano wrote:
> A tangent. It is curious why [PATCH 2/6] alone ended up with an encoded
> "Subject" header, like this:
>
> Subject: =?UTF-8?q?=5BPATCH=202/6=5D=20gitweb=3A=20Change=20the=20
> way=20=22content=20tags=22=20=28=27ctags=27=29=20are=20handled?=
>
> The message actually has the above as a long single line, as can be seen
> at http://article.gmane.org/gmane.comp.version-control.git/172479/raw
>
> Just being curious.
>
> The headers suggest that sending MUA was git-send-email speaking to gmail
> SMTP. Did we introduce bugs to send-email recently?
This is git-send-email from git version 1.7.3 (not most recent).
What might be important, and what you can't get from mail itself, is that
patch was generated using git-format-patch, but UTF-8 characters in body
of email were introduced during editing it, and were not present in commit
message (sorry, bad practice). So git-send-email asked about encoding,
and I chosen default UTF-8
The following files are 8bit, but do not declare a Content-Transfer-Encoding.
mdir.7/0002-gitweb-Change-the-way-content-tags-ctags-are-handled.txt
Which 8bit encoding should I declare [UTF-8]? <ENTER>
Though subject itself does contain only 7bit US-ASCII, so there is no need
for encoding it (with quoted-printable, as Subject appears before
Content-Encoding and MIME-Version headers).
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/6] gitweb: Improve ctags, introduce categories
2011-04-29 21:31 ` [PATCH 0/6] gitweb: Improve ctags, introduce categories Junio C Hamano
2011-04-29 23:53 ` Jakub Narebski
@ 2011-04-30 20:36 ` Øyvind A. Holm
2011-05-03 14:02 ` git-send-email and non 7bit clean message (was: [PATCH 0/6] gitweb: Improve ctags, introduce categories) Jakub Narebski
1 sibling, 1 reply; 13+ messages in thread
From: Øyvind A. Holm @ 2011-04-30 20:36 UTC (permalink / raw)
To: Junio C Hamano
Cc: Jakub Narebski, git, John 'Warthog9' Hawley,
John 'Warthog9' Hawley, Uwe Kleine-Koenig,
Jonathan Nieder, Petr Baudis, Sebastien Cevey
On 29 April 2011 23:31, Junio C Hamano <gitster@pobox.com> wrote:
> A tangent. It is curious why [PATCH 2/6] alone ended up with an encoded
> "Subject" header, like this:
>
> Subject: =?UTF-8?q?=5BPATCH=202/6=5D=20gitweb=3A=20Change=20the=20
> way=20=22content=20tags=22=20=28=27ctags=27=29=20are=20handled?=
>
> The message actually has the above as a long single line, as can be seen at
> http://article.gmane.org/gmane.comp.version-control.git/172479/raw
>
> Just being curious.
This seems as the same thing that I reported on 2010-04-25 23:35:49Z,
<http://thread.gmane.org/gmane.comp.version-control.git/145774>. If there's a
character above U+007F in the log message below line #2, the Subject: line is
garbled. In this case it is, it's the "ö" in Uwe's name that leads to this
error.
A test to reproduce this is at <https://gist.github.com/378785>, but it seems
as this was fixed between v1.7.4.1-292-ge2a57aa and v1.7.4.1-343-ga91df69 ,
probably happened in dc7f96f (Merge branch 'jk/format-patch-multiline-header').
The patch at <http://article.gmane.org/gmane.comp.version-control.git/172479/raw>
was generated with git-1.7.3, so it would trigger the error in this case.
Regards,
Øyvind
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git-send-email and non 7bit clean message (was: [PATCH 0/6] gitweb: Improve ctags, introduce categories)
2011-04-30 20:36 ` Øyvind A. Holm
@ 2011-05-03 14:02 ` Jakub Narebski
2011-05-04 13:50 ` [PATCH/RFC] git-send-email: Do not encode Subject if not required (was: git-send-email and non 7bit clean message) Jakub Narebski
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Narebski @ 2011-05-03 14:02 UTC (permalink / raw)
To: Øyvind A. Holm
Cc: Junio C Hamano, git, John 'Warthog9' Hawley,
John 'Warthog9' Hawley, Uwe Kleine-Koenig,
Jonathan Nieder, Petr Baudis, Sebastien Cevey, Greg Kroah-Hartman,
Ryan Anderson
On Sat, 30 Apr 2011, Øyvind A. Holm wrote:
> On 29 April 2011 23:31, Junio C Hamano <gitster@pobox.com> wrote:
> > A tangent. It is curious why [PATCH 2/6] alone ended up with an encoded
> > "Subject" header, like this:
> >
> > Subject: =?UTF-8?q?=5BPATCH=202/6=5D=20gitweb=3A=20Change=20the=20
> > way=20=22content=20tags=22=20=28=27ctags=27=29=20are=20handled?=
> >
> > The message actually has the above as a long single line, as can be seen at
> > http://article.gmane.org/gmane.comp.version-control.git/172479/raw
> >
> > Just being curious.
>
> This seems as the same thing that I reported on 2010-04-25 23:35:49Z,
> <http://thread.gmane.org/gmane.comp.version-control.git/145774>. If there's a
> character above U+007F in the log message below line #2, the Subject: line is
> garbled. In this case it is, it's the "ö" in Uwe's name that leads to this
> error.
>
> A test to reproduce this is at <https://gist.github.com/378785>, but it seems
> as this was fixed between v1.7.4.1-292-ge2a57aa and v1.7.4.1-343-ga91df69 ,
> probably happened in dc7f96f (Merge branch 'jk/format-patch-multiline-header').
> The patch at <http://article.gmane.org/gmane.comp.version-control.git/172479/raw>
> was generated with git-1.7.3, so it would trigger the error in this case.
I have just upgraded git to 1.7.5, and unfortunately it still has the
same bug (note that UTF-8 character was introduced while editing patch,
so git-format-patch doesn't see it):
5369:[gitweb/web@git]# git send-email [...] --dry-run mdir.1/*.txt
The following files are 8bit, but do not declare a Content-Transfer-Encoding.
mdir.1/0001-gitweb-Prepare-for-splitting-gitweb.txt
Which 8bit encoding should I declare [UTF-8]? <ENTER>
Dry-Sent [PATCHv2 0/2] gitweb: Beginnings of splitting gitweb into modules
Dry-Sent =?UTF-8?q?=5BPATCHv2=201/2=20=28RFC=3F=29=5D=20gitweb=3A=20Prepare=20for=20splitting=20gitweb?=
Dry-Sent [PATCHv2 2/2 (PoC)] gitweb: Create Gitweb::Util module
Note that having
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
is not enough[1]. However the patch send is understood correctly by email
programs: see [PATCH 2/6] in this thread.
I have added
Content-Transfer-Encoding: 8bit
to mdir.1/0001-gitweb-Prepare-for-splitting-gitweb.txt, and now it works
all right.
5370:[gitweb/web@git]# git send-email [...] mdir.1/*.txt
Dry-Sent [PATCHv2 0/2] gitweb: Beginnings of splitting gitweb into modules
Dry-Sent [PATCHv2 1/2 (RFC?)] gitweb: Prepare for splitting gitweb
Dry-Sent [PATCHv2 2/2 (PoC)] gitweb: Create Gitweb::Util module
Footnotes:
^^^^^^^^^^
[1]: Note that git-send-email does something strange: first, the problem
is with Content-Transfer-Encoding, and git-send-email asks for 8bit
encoding, suggesting UTF-8, instead of asking for transfer encoding,
sugesting e.g. 8bit.
Second, from email headers I have added git-send-email should _know_ that
message uses UTF-8 encoding (though this is side issue, and probably result
of above).
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH/RFC] git-send-email: Do not encode Subject if not required (was: git-send-email and non 7bit clean message)
2011-05-03 14:02 ` git-send-email and non 7bit clean message (was: [PATCH 0/6] gitweb: Improve ctags, introduce categories) Jakub Narebski
@ 2011-05-04 13:50 ` Jakub Narebski
0 siblings, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2011-05-04 13:50 UTC (permalink / raw)
To: Øyvind A. Holm
Cc: Junio C Hamano, git, Greg Kroah-Hartman, Ryan Anderson
On Tue, 3 May 2011, Jakub Narebski wrote:
> On Sat, 30 Apr 2011, Øyvind A. Holm wrote:
> > On 29 April 2011 23:31, Junio C Hamano <gitster@pobox.com> wrote:
> > > A tangent. It is curious why [PATCH 2/6] alone ended up with an encoded
> > > "Subject" header, like this:
> > >
> > > Subject: =?UTF-8?q?=5BPATCH=202/6=5D=20gitweb=3A=20Change=20the=20
> > > way=20=22content=20tags=22=20=28=27ctags=27=29=20are=20handled?=
> > >
> > > The message actually has the above as a long single line, as can be seen at
> > > http://article.gmane.org/gmane.comp.version-control.git/172479/raw
> > >
> > > Just being curious.
> >
> > This seems as the same thing that I reported on 2010-04-25 23:35:49Z,
> > <http://thread.gmane.org/gmane.comp.version-control.git/145774>. If there's a
> > character above U+007F in the log message below line #2, the Subject: line is
> > garbled. In this case it is, it's the "ö" in Uwe's name that leads to this
> > error.
Note: this line was added "by hand" when editing patch, so commit message
that git-format-patch seen did not contain any non-ASCII characters.
Otherwise fir-format-patch would add correct MIME headers itself.
> > A test to reproduce this is at <https://gist.github.com/378785>, but it seems
> > as this was fixed between v1.7.4.1-292-ge2a57aa and v1.7.4.1-343-ga91df69 ,
> > probably happened in dc7f96f (Merge branch 'jk/format-patch-multiline-header').
> > The patch at <http://article.gmane.org/gmane.comp.version-control.git/172479/raw>
> > was generated with git-1.7.3, so it would trigger the error in this case.
>
> I have just upgraded git to 1.7.5, and unfortunately it still has the
> same bug (note that UTF-8 character was introduced while editing patch,
> so git-format-patch doesn't see it):
>
> 5369:[gitweb/web@git]# git send-email [...] --dry-run mdir.1/*.txt
> The following files are 8bit, but do not declare a Content-Transfer-Encoding.
> mdir.1/0001-gitweb-Prepare-for-splitting-gitweb.txt
> Which 8bit encoding should I declare [UTF-8]? <ENTER>
> Dry-Sent [PATCHv2 0/2] gitweb: Beginnings of splitting gitweb into modules
> Dry-Sent =?UTF-8?q?=5BPATCHv2=201/2=20=28RFC=3F=29=5D=20gitweb=3A=20Prepare=20for=20splitting=20gitweb?=
> Dry-Sent [PATCHv2 2/2 (PoC)] gitweb: Create Gitweb::Util module
>
> Note that having
>
> MIME-Version: 1.0
> Content-Type: text/plain; charset=utf-8
>
> is not enough[1].
Perhaps something like attached patch could be beginning of fixing
this issue.
> Footnotes:
> ^^^^^^^^^^
> [1]: Note that git-send-email does something strange: first, the problem
> is with Content-Transfer-Encoding, and git-send-email asks for 8bit
> encoding, suggesting UTF-8, instead of asking for transfer encoding,
> sugesting e.g. 8bit.
Actually git-send-email assumes 8bit if not stated, but it does not
parse 'Content-Type' header for encoding to be used, perhaps correctly
assuming that if required 'Content-Transfer-Encoding' is missing, then
'Content-Type' is most probably missing as well.
-- >8 ---------- >8 --
Subject: [PATCH] git-send-email: Do not encode Subject if not required
If "Subject:" header of an email does not contain non-ASCII characters,
then we don't need to RFC-2047 encode it, even if file had broken
encoding.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
git-send-email.perl | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/git-send-email.perl b/git-send-email.perl
index 98ab33a..1c6b1a8 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1281,7 +1281,9 @@ foreach my $t (@files) {
$body_encoding = $auto_8bit_encoding;
}
- if ($broken_encoding{$t} && !is_rfc2047_quoted($subject)) {
+ if ($broken_encoding{$t} &&
+ !is_rfc2047_quoted($subject) &&
+ $subject =~ /[^[:ascii:]]/) {
$subject = quote_rfc2047($subject, $auto_8bit_encoding);
}
--
1.7.5
^ permalink raw reply related [flat|nested] 13+ messages in thread