* [RFC/PATCH] gitweb: Simplify git_project_list_body
@ 2008-05-01 10:20 Jakub Narebski
2008-05-02 10:30 ` [RFC/PATCH] gitweb: Allow project description in project_index file Jakub Narebski
0 siblings, 1 reply; 20+ messages in thread
From: Jakub Narebski @ 2008-05-01 10:20 UTC (permalink / raw)
To: git
Extract filling project list info, i.e. adding age, description, owner
and forks information, to fill_project_list_info() subroutine. This
is preparation for smart pagination and smart searching (if possible,
only calculate info for those projects which are shown).
Extract generating table header cell one can sort by into
print_sort_th_str() and print_sort_th_num() subroutines, and
print_sort_th() driver subroutine. This avoids repetition, and should
make further improvements (like JavaScript client-side sorting) easier.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is an RFC patch, first patch of series meant to improve generating
list of projects, trying to reduce both amount of work server has to do
to generate the page, and the size of the page to reduce amount of data
send and amount of rendering web browser (client) has to do.
An RFC because:
* I don't quote like naming of print_sort_th*() subroutines, but
I could not think of a better name
* I wonder if there is a better way to pass arguments, and better way
to deal with numeric sorting (age) and lexical soering (the rest).
I'd like for someone with better Perl knowledge than me examine this
patch.
What is planned:
* Adding description to project_list format, so everything except age
(last changed) can be read by parsing single file, making gitweb
performance better
* Paginate projects list in such way so the information such as age
is gathered and filled only for projects which are shown, if possible
(it is not possible for example if worting by age)
* Add searching for projects, by project name (project path),
description, and owner. Make gitweb use search page if number of
projects is greater than some configurable number, as a front page
What is planned in the future
* Add project's categories support.
gitweb/gitweb.perl | 108 ++++++++++++++++++++++++++++-----------------------
1 files changed, 59 insertions(+), 49 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2facf2d..6a28dca 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3548,21 +3548,23 @@ sub git_patchset_body {
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
-sub git_project_list_body {
- my ($projlist, $order, $from, $to, $extra, $no_header) = @_;
-
- my ($check_forks) = gitweb_check_feature('forks');
-
+# fills project list info (age, description, owner, forks) for each
+# project in the list, removing invalid projects from returned list
+sub fill_project_list_info {
+ my ($projlist, $check_forks) = @_;
my @projects;
+
+ PROJECT:
foreach my $pr (@$projlist) {
- my (@aa) = git_get_last_activity($pr->{'path'});
- unless (@aa) {
- next;
+ my (@activity) = git_get_last_activity($pr->{'path'});
+ unless (@activity) {
+ next PROJECT;
}
- ($pr->{'age'}, $pr->{'age_string'}) = @aa;
+ ($pr->{'age'}, $pr->{'age_string'}) = @activity;
if (!defined $pr->{'descr'}) {
my $descr = git_get_project_description($pr->{'path'}) || "";
- $pr->{'descr_long'} = to_utf8($descr);
+ $descr = to_utf8($descr);
+ $pr->{'descr_long'} = $descr;
$pr->{'descr'} = chop_str($descr, $projects_list_description_width, 5);
}
if (!defined $pr->{'owner'}) {
@@ -3574,14 +3576,50 @@ sub git_project_list_body {
($pname !~ /\/$/) &&
(-d "$projectroot/$pname")) {
$pr->{'forks'} = "-d $projectroot/$pname";
- }
- else {
+ } else {
$pr->{'forks'} = 0;
}
}
push @projects, $pr;
}
+ return @projects;
+}
+
+sub print_sort_th {
+ my ($str_sort, $name, $order, $key, $header, $projlist) = @_;
+ $key ||= $name;
+ $header ||= ucfirst($name);
+
+ if ($order eq $name) {
+ if ($str_sort) {
+ @$projlist = sort {$a->{$key} cmp $b->{$key}} @$projlist;
+ } else {
+ @$projlist = sort {$a->{$key} <=> $b->{$key}} @$projlist;
+ }
+ print "<th>$header</th>\n";
+ } else {
+ print "<th>" .
+ $cgi->a({-href => href(project=>undef, order=>$name),
+ -class => "header"}, $header) .
+ "</th>\n";
+ }
+}
+
+sub print_sort_th_str {
+ print_sort_th(1, @_);
+}
+
+sub print_sort_th_num {
+ print_sort_th(0, @_);
+}
+
+sub git_project_list_body {
+ my ($projlist, $order, $from, $to, $extra, $no_header) = @_;
+
+ my ($check_forks) = gitweb_check_feature('forks');
+ my @projects = fill_project_list_info($projlist, $check_forks);
+
$order ||= $default_projects_order;
$from = 0 unless defined $from;
$to = $#projects if (!defined $to || $#projects < $to);
@@ -3592,43 +3630,15 @@ sub git_project_list_body {
if ($check_forks) {
print "<th></th>\n";
}
- if ($order eq "project") {
- @projects = sort {$a->{'path'} cmp $b->{'path'}} @projects;
- print "<th>Project</th>\n";
- } else {
- print "<th>" .
- $cgi->a({-href => href(project=>undef, order=>'project'),
- -class => "header"}, "Project") .
- "</th>\n";
- }
- if ($order eq "descr") {
- @projects = sort {$a->{'descr'} cmp $b->{'descr'}} @projects;
- print "<th>Description</th>\n";
- } else {
- print "<th>" .
- $cgi->a({-href => href(project=>undef, order=>'descr'),
- -class => "header"}, "Description") .
- "</th>\n";
- }
- if ($order eq "owner") {
- @projects = sort {$a->{'owner'} cmp $b->{'owner'}} @projects;
- print "<th>Owner</th>\n";
- } else {
- print "<th>" .
- $cgi->a({-href => href(project=>undef, order=>'owner'),
- -class => "header"}, "Owner") .
- "</th>\n";
- }
- if ($order eq "age") {
- @projects = sort {$a->{'age'} <=> $b->{'age'}} @projects;
- print "<th>Last Change</th>\n";
- } else {
- print "<th>" .
- $cgi->a({-href => href(project=>undef, order=>'age'),
- -class => "header"}, "Last Change") .
- "</th>\n";
- }
- print "<th></th>\n" .
+ print_sort_th_str('project', $order, 'path',
+ 'Project', \@projects);
+ print_sort_th_str('descr', $order, 'descr_long',
+ 'Description', \@projects);
+ print_sort_th_str('owner', $order, 'owner',
+ 'Owner', \@projects);
+ print_sort_th_num('age', $order, 'age',
+ 'Last Change', \@projects);
+ print "<th></th>\n" . # for links
"</tr>\n";
}
my $alternate = 1;
--
1.5.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC/PATCH] gitweb: Allow project description in project_index file
2008-05-01 10:20 [RFC/PATCH] gitweb: Simplify git_project_list_body Jakub Narebski
@ 2008-05-02 10:30 ` Jakub Narebski
2008-05-02 13:04 ` Miklos Vajna
0 siblings, 1 reply; 20+ messages in thread
From: Jakub Narebski @ 2008-05-02 10:30 UTC (permalink / raw)
To: git; +Cc: Petr Baudis
Change format of $projects_list file from
<URI-encoded path> SPC <URI-encoded owner> LF
to
<URI-encoded path> SPC <URI-encoded owner> [SPC <description>] LF
with optional project description. To make it easier to read and edit
$projects_list file the description _is not_ URI encoded. Please
remember that only single line of repository (project) description is
supported. Note that SPC can be replaced by any whitespace
character.
This change required modifying git_get_projects_list() subroutine
(part when $projects_list is a file, not a directory to be scanned),
and git_get_project_description() subroutine.
The 'project_index' action creates projects list index file in the new
format, with project description.
Thi change is backwards compatibile: older gitweb with new projects
list format would work as it used to work; new gitweb with old format
would get (try to get) project description from other sources.
While at it some comments describing changes and changed subroutines
were added, and information about $projects_list format was updated in
gitweb/README.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is second patch in the yet to be written series improving projects
list page and related things. I'm sending an early version of patch to
ask for comments.
The reason why it is an RFC is the decision to _not_ URI-decode (to not
force URI-encoding of e.g. spaces) in the project (repository)
description part of projects list page. It makes projects index file
easier to read and to edit, but it closes possibility of extending this
format further. And there is another thing that could be reasonably put
in this file: future project *categories* support. So should I try to
add categories support first?
Note that without this patch future patches, dividing projects list page
into 100-items long pages, and adding projects search page, which are
meant to bring preformance improvements, would get less performance
improvements without this patch.
Comments appreciated...
gitweb/README | 9 +++++--
gitweb/gitweb.perl | 61 +++++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 59 insertions(+), 11 deletions(-)
diff --git a/gitweb/README b/gitweb/README
index 8f7ea36..e58fe18 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -157,9 +157,12 @@ not include variables usually directly set during build):
* $projects_list
Source of projects list, either directory to scan, or text file
with list of repositories (in the "<URI-encoded repository path> SPC
- <URI-encoded repository owner>" format). Set to $GITWEB_LIST
- during installation. If empty, $projectroot is used to scan for
- repositories.
+ <URI-encoded repository owner>" line format, or optionally with
+ project description in "<URI-encoded repository path> SPC
+ <URI-encoded repository owner> SPC <single line description>";
+ actually there can be any whitespace in place of SPC). Set to
+ $GITWEB_LIST during installation. If empty, $projectroot is used
+ to scan for repositories.
* $my_url, $my_uri
URL and absolute URL of gitweb script; you might need to set those
variables if you are using 'pathinfo' feature: see also below.
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 6a28dca..dc3f99a 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1702,13 +1702,43 @@ sub git_get_path_by_hash {
## ......................................................................
## git utility functions, directly accessing git repository
+# this is helper function for git_get_project_description()
+sub git_get_project_description_from_list {
+ my $path = shift;
+
+ open(my $fd, $projects_list)
+ or return;
+ while (my $line = <$fd>) {
+ chomp $line;
+ my ($pr, undef, $descr) = split(' ', $line, 3);
+ $pr = unescape($pr);
+ if ($pr eq $path) {
+ close $fd;
+ return $descr;
+ }
+ }
+ close $fd;
+ return;
+}
+
+# sources for project description are
+# * project_list, if $project_list is a file, and it uses new format:
+# URI_encoded(<path>) SPC URI_encoded(<owner>) SPC <description> LF
+# * $GIT_DIR/description file in project repository, if it exists
+# * gitweb.description configuration variable for a project
sub git_get_project_description {
my $path = shift;
+ my $descr;
+
+ if (-f $projects_list) {
+ $descr = git_get_project_description_from_list($path);
+ return $descr if (defined $descr); # try other sources if needed
+ }
$git_dir = "$projectroot/$path";
open my $fd, "$git_dir/description"
or return git_get_project_config('description');
- my $descr = <$fd>;
+ $descr = <$fd>;
close $fd;
if (defined $descr) {
chomp $descr;
@@ -1774,20 +1804,24 @@ sub git_get_projects_list {
}, "$dir");
} elsif (-f $projects_list) {
- # read from file(url-encoded):
+ # read from file (whitespace separated, url-encoded):
# 'git%2Fgit.git Linus+Torvalds'
# 'libs%2Fklibc%2Fklibc.git H.+Peter+Anvin'
# 'linux%2Fhotplug%2Fudev.git Greg+Kroah-Hartman'
+ # optionally with description (which isn't url-encoded)
+ # 'git/git.git Junio+C+Hamano The core git plumbing'
+ # 'libs/klibc/klibc.git H.+Peter+Anvin klibc main development tree'
+ # 'linux/hotplug/udev.git Kay+Sievers udev development tree'
my %paths;
open my ($fd), $projects_list or return;
PROJECT:
while (my $line = <$fd>) {
chomp $line;
- my ($path, $owner) = split ' ', $line;
+ my ($path, $owner, $descr) = split(' ', $line, 3);
$path = unescape($path);
$owner = unescape($owner);
if (!defined $path) {
- next;
+ next PROJECT;
}
if ($filter ne '') {
# looking for forks;
@@ -1818,9 +1852,14 @@ sub git_get_projects_list {
}
if (check_export_ok("$projectroot/$path")) {
my $pr = {
- path => $path,
+ path => to_utf8($path),
owner => to_utf8($owner),
};
+ if (defined $descr) {
+ $descr = to_utf8($descr);
+ $pr->{'descr_long'} = $descr;
+ $pr->{'descr'} = chop_str($descr, $projects_list_description_width, 5);
+ }
push @list, $pr;
(my $forks_path = $path) =~ s/\.git$//;
$paths{$forks_path}++;
@@ -4006,21 +4045,27 @@ sub git_project_index {
print $cgi->header(
-type => 'text/plain',
-charset => 'utf-8',
- -content_disposition => 'inline; filename="index.aux"');
+ -content_disposition => 'inline; filename="projects_index.aux"');
foreach my $pr (@projects) {
if (!exists $pr->{'owner'}) {
$pr->{'owner'} = git_get_project_owner("$pr->{'path'}");
+ $pr->{'owner'} = to_utf8($pr->{'owner'});
+ }
+ if (!exists $pr->{'descr_long'}) {
+ $pr->{'descr_long'} = git_get_project_description($pr->{'path'}) || "";
+ $pr->{'descr_long'} = to_utf8($pr->{'descr_long'});
}
- my ($path, $owner) = ($pr->{'path'}, $pr->{'owner'});
+ my ($path, $owner, $descr) =
+ ($pr->{'path'}, $pr->{'owner'}, $pr->{'descr_long'});
# quote as in CGI::Util::encode, but keep the slash, and use '+' for ' '
$path =~ s/([^a-zA-Z0-9_.\-\/ ])/sprintf("%%%02X", ord($1))/eg;
$owner =~ s/([^a-zA-Z0-9_.\-\/ ])/sprintf("%%%02X", ord($1))/eg;
$path =~ s/ /\+/g;
$owner =~ s/ /\+/g;
- print "$path $owner\n";
+ print "$path $owner $descr\n";
}
}
--
1.5.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC/PATCH] gitweb: Allow project description in project_index file
2008-05-02 10:30 ` [RFC/PATCH] gitweb: Allow project description in project_index file Jakub Narebski
@ 2008-05-02 13:04 ` Miklos Vajna
2008-05-03 9:03 ` Jakub Narebski
0 siblings, 1 reply; 20+ messages in thread
From: Miklos Vajna @ 2008-05-02 13:04 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git, Petr Baudis
[-- Attachment #1: Type: text/plain, Size: 695 bytes --]
On Fri, May 02, 2008 at 12:30:04PM +0200, Jakub Narebski <jnareb@gmail.com> wrote:
> The reason why it is an RFC is the decision to _not_ URI-decode (to not
> force URI-encoding of e.g. spaces) in the project (repository)
> description part of projects list page. It makes projects index file
> easier to read and to edit, but it closes possibility of extending this
> format further. And there is another thing that could be reasonably put
> in this file: future project *categories* support. So should I try to
> add categories support first?
Just my two cents, I don't think that encoded strings are so unreadable.
Also, having the ability to extend the file later as well would be nice.
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC/PATCH] gitweb: Allow project description in project_index file
2008-05-02 13:04 ` Miklos Vajna
@ 2008-05-03 9:03 ` Jakub Narebski
2008-05-04 2:03 ` Miklos Vajna
2008-05-09 13:23 ` [RFC/PATCH] gitweb: Project search Jakub Narebski
0 siblings, 2 replies; 20+ messages in thread
From: Jakub Narebski @ 2008-05-03 9:03 UTC (permalink / raw)
To: Miklos Vajna; +Cc: git, Petr Baudis
Change format of $projects_list file from
<URI-encoded path> SPC <URI-encoded owner> LF
to
<URI-encoded path> SPC
<URI-encoded owner> [SPC <URI-encoded description>] LF
with optional project description. Please remember that only single
line of repository (project) description is supported. Note that SPC
can be replaced by any whitespace character.
This change required modifying git_get_projects_list() subroutine
(part when $projects_list is a file, not a directory to be scanned),
and git_get_project_description() subroutine.
The 'project_index' action creates projects list index file in the new
format, with project description. Also, it now does only minial level
of escaping / encoding.
While at it some comments describing changes and changed subroutines
were added, and information about $projects_list format was updated in
gitweb/README and in gitweb/INSTALL.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
On Fri, 2 May 2008, Miklos Vajna wrote:
> On Fri, May 02, 2008 at 12:30:04PM +0200, Jakub Narebski <jnareb@gmail.com> wrote:
>>
>> The reason why it is an RFC is the decision to _not_ URI-decode (to not
>> force URI-encoding of e.g. spaces) in the project (repository)
>> description part of projects list page. It makes projects index file
>> easier to read and to edit, but it closes possibility of extending this
>> format further. And there is another thing that could be reasonably put
>> in this file: future project *categories* support. So should I try to
>> add categories support first?
>
> Just my two cents, I don't think that encoded strings are so unreadable.
> Also, having the ability to extend the file later as well would be nice.
This version uses URI-encoded 'description' field in $projects_list
file, thus allowing to extend it (e.g. by adding categories support)
in a backward compatibile way.
Also, the format is described in more elaborate way; please comment...
gitweb/INSTALL | 10 +++++--
gitweb/README | 9 ++++--
gitweb/gitweb.perl | 74 ++++++++++++++++++++++++++++++++++++++++++---------
3 files changed, 74 insertions(+), 19 deletions(-)
diff --git a/gitweb/INSTALL b/gitweb/INSTALL
index f7194db..7e25a7f 100644
--- a/gitweb/INSTALL
+++ b/gitweb/INSTALL
@@ -140,9 +140,13 @@ Gitweb repositories
Each line of the projects list file should consist of the url-encoded path
to the project repository database (relative to projectroot), followed
- by the url-encoded project owner on the same line (separated by a space).
- Spaces in both project path and project owner have to be encoded as either
- '%20' or '+'.
+ by the url-encoded project owner on the same line (separated by a space),
+ and optionally followed by the url-encoded project description (spearated
+ by space). Spaces in project path, project owner and project description
+ have to be encoded as either '%20' or '+'. Other whitespace (separator),
+ plus sign '+' (used as replacement for spaces), and percent sign '%' (used
+ for encoding / escaping) have to be url-encoded, i.e. replaced by '%'
+ followed by two-digit character number in octal.
You can generate the projects list index file using the project_index
action (the 'TXT' link on projects list page) directly from gitweb.
diff --git a/gitweb/README b/gitweb/README
index 8f7ea36..130449c 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -157,9 +157,12 @@ not include variables usually directly set during build):
* $projects_list
Source of projects list, either directory to scan, or text file
with list of repositories (in the "<URI-encoded repository path> SPC
- <URI-encoded repository owner>" format). Set to $GITWEB_LIST
- during installation. If empty, $projectroot is used to scan for
- repositories.
+ <URI-encoded repository owner>" line format, or optionally with
+ project description in "<URI-encoded repository path> SPC
+ <URI-encoded repository owner> SPC <URI-encoded description>";
+ actually there can be any sequence of whitespace in place of SPC).
+ Set to $GITWEB_LIST during installation. If empty, $projectroot is
+ used to scan for repositories.
* $my_url, $my_uri
URL and absolute URL of gitweb script; you might need to set those
variables if you are using 'pathinfo' feature: see also below.
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 6a28dca..296bfaa 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1702,13 +1702,44 @@ sub git_get_path_by_hash {
## ......................................................................
## git utility functions, directly accessing git repository
+# this is helper function for git_get_project_description()
+sub git_get_project_description_from_list {
+ my $path = shift;
+
+ open(my $fd, $projects_list)
+ or return;
+ while (my $line = <$fd>) {
+ chomp $line;
+ my ($proj, undef, $descr) = split ' ', $line;
+ $proj = unescape($pr);
+ $descr = unescape($descr);
+ if ($proj eq $path) {
+ close $fd;
+ return $descr;
+ }
+ }
+ close $fd;
+ return;
+}
+
+# sources for project description are
+# * project_list, if $project_list is a file, and it uses new format:
+# <encoded path> SPC <encoded owner> SPC <encoded description> LF
+# * $GIT_DIR/description file in project repository, if it exists
+# * gitweb.description configuration variable for a project
sub git_get_project_description {
my $path = shift;
+ my $descr;
+
+ if (-f $projects_list) {
+ $descr = git_get_project_description_from_list($path);
+ return $descr if (defined $descr); # try other sources if needed
+ }
$git_dir = "$projectroot/$path";
open my $fd, "$git_dir/description"
or return git_get_project_config('description');
- my $descr = <$fd>;
+ $descr = <$fd>;
close $fd;
if (defined $descr) {
chomp $descr;
@@ -1774,20 +1805,25 @@ sub git_get_projects_list {
}, "$dir");
} elsif (-f $projects_list) {
- # read from file(url-encoded):
+ # read from file (whitespace separated, url-encoded):
# 'git%2Fgit.git Linus+Torvalds'
# 'libs%2Fklibc%2Fklibc.git H.+Peter+Anvin'
# 'linux%2Fhotplug%2Fudev.git Greg+Kroah-Hartman'
+ # optionally with description
+ # 'git/git.git Junio+C+Hamano The+core+git+plumbing'
+ # 'libs/klibc/klibc.git H.+Peter+Anvin klibc+main+development+tree'
+ # 'linux/hotplug/udev.git Kay+Sievers udev+development+tree'
my %paths;
open my ($fd), $projects_list or return;
PROJECT:
while (my $line = <$fd>) {
chomp $line;
- my ($path, $owner) = split ' ', $line;
+ my ($path, $owner, $descr) = split ' ', $line;
$path = unescape($path);
$owner = unescape($owner);
+ $descr = unescape($descr);
if (!defined $path) {
- next;
+ next PROJECT;
}
if ($filter ne '') {
# looking for forks;
@@ -1818,9 +1854,14 @@ sub git_get_projects_list {
}
if (check_export_ok("$projectroot/$path")) {
my $pr = {
- path => $path,
+ path => to_utf8($path),
owner => to_utf8($owner),
};
+ if (defined $descr) {
+ $descr = to_utf8($descr);
+ $pr->{'descr_long'} = $descr;
+ $pr->{'descr'} = chop_str($descr, $projects_list_description_width, 5);
+ }
push @list, $pr;
(my $forks_path = $path) =~ s/\.git$//;
$paths{$forks_path}++;
@@ -4006,21 +4047,28 @@ sub git_project_index {
print $cgi->header(
-type => 'text/plain',
-charset => 'utf-8',
- -content_disposition => 'inline; filename="index.aux"');
+ -content_disposition => 'inline; filename="projects_index.aux"');
foreach my $pr (@projects) {
if (!exists $pr->{'owner'}) {
$pr->{'owner'} = git_get_project_owner("$pr->{'path'}");
+ $pr->{'owner'} = to_utf8($pr->{'owner'});
+ }
+ if (!exists $pr->{'descr_long'}) {
+ $pr->{'descr_long'} = git_get_project_description($pr->{'path'}) || "";
+ $pr->{'descr_long'} = to_utf8($pr->{'descr_long'});
}
- my ($path, $owner) = ($pr->{'path'}, $pr->{'owner'});
- # quote as in CGI::Util::encode, but keep the slash, and use '+' for ' '
- $path =~ s/([^a-zA-Z0-9_.\-\/ ])/sprintf("%%%02X", ord($1))/eg;
- $owner =~ s/([^a-zA-Z0-9_.\-\/ ])/sprintf("%%%02X", ord($1))/eg;
- $path =~ s/ /\+/g;
- $owner =~ s/ /\+/g;
+ my @pr_info =
+ ($pr->{'path'}, $pr->{'owner'}, $pr->{'descr_long'});
+ foreach (@pr_info) {
+ # quote only minimal set, only what has to be quoted
+ s/([+%])/sprintf("%%%02X", ord($1))/eg;
+ s/ /\+/g;
+ s/([[:space:]])/sprintf("%%%02X", ord($1))/eg;
+ }
- print "$path $owner\n";
+ print join(' ', @pr_info)."\n";
}
}
--
1.5.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC/PATCH] gitweb: Allow project description in project_index file
2008-05-03 9:03 ` Jakub Narebski
@ 2008-05-04 2:03 ` Miklos Vajna
2008-05-09 13:23 ` [RFC/PATCH] gitweb: Project search Jakub Narebski
1 sibling, 0 replies; 20+ messages in thread
From: Miklos Vajna @ 2008-05-04 2:03 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git, Petr Baudis
[-- Attachment #1: Type: text/plain, Size: 219 bytes --]
On Sat, May 03, 2008 at 11:03:13AM +0200, Jakub Narebski <jnareb@gmail.com> wrote:
> Also, the format is described in more elaborate way; please comment...
Not that my opinion matters here too much, but I like it. :-)
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC/PATCH] gitweb: Project search
2008-05-03 9:03 ` Jakub Narebski
2008-05-04 2:03 ` Miklos Vajna
@ 2008-05-09 13:23 ` Jakub Narebski
2008-05-10 9:28 ` [RFC/PATCH] gitweb: Paginate project list Jakub Narebski
1 sibling, 1 reply; 20+ messages in thread
From: Jakub Narebski @ 2008-05-09 13:23 UTC (permalink / raw)
To: git; +Cc: Petr Baudis, J.H., Lea Wiemann
If the number of projects is greater than some number, currently for
testing set to 1 (in production probably to 100), then projects
listing is replaced by project search form. One can search by project
name (project path relative to $projectroot), by project description,
and by owner. There is also link to show all projects.
Before searching by some field the information we search for must be
filled in. For this fill_project_list_info() was enhanced to take
additional parameter which part of projects info to fill. This way we
can limit doing expensive calculations (like running git-for-each-ref
to get 'age' / "Last changed" info) only to projects which we will
show as search results.
To make sorting by given column of project list / project search
results, the 'sort by' links in print_sort_th() subroutine are
generated as 'replay' links.
NOTE: currently match is _not_ highlighted in search results.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch depends on earlier
"[RFC/PATCH] gitweb: Simplify git_project_list_body"
patch, and may (but should not) depend textually on
"[RFC/PATCH] gitweb: Allow project description in project_index file"
As this is quick'n'dirty proof-of-concept patch the styling of project
search form is done in HTML, instead of using CSS to do that.
This patch should reduce both server load and browser load for sites
with very large number of projects.
NOTE 1: in production one would increase threshold above which project
search is used instead of listing all projects.
NOTE 2: I think that for effective caching one would need to cache
data, not final output (to J.H. and Lea Wiemann).
What do you think about this concept?
gitweb/gitweb.perl | 92 ++++++++++++++++++++++++++++++++++++++++++++-------
1 files changed, 79 insertions(+), 13 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index b117364..7e1a9b4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -472,7 +472,7 @@ if (defined $page) {
our $searchtype = $cgi->param('st');
if (defined $searchtype) {
- if ($searchtype =~ m/[^a-z]/) {
+ if ($searchtype =~ m/[^a-z_]/) {
die_error(undef, "Invalid searchtype parameter");
}
}
@@ -3589,29 +3589,61 @@ sub git_patchset_body {
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
+sub git_project_search_form {
+ my ($projlist, $searchtype, $searchtext, $search_use_regexp);
+
+ $searchtype ||= 'descr';
+ print "\n<!-- START: project_list search form -->\n" .
+ "<div style=\"text-align: center; margin-top: 20px;\">\n";
+ print $cgi->startform(-method => 'get', -action => $my_uri) .
+ $cgi->hidden(-name => 'a', -value => 'project_list') . "\n" .
+ $cgi->popup_menu(-name => 'st', -default => $searchtype,
+ -values => ['project', 'descr', 'owner'],
+ -labels => {'project' => 'Project',
+ 'descr' => 'Description',
+ 'owner' => 'Owner'}) . "\n" .
+ $cgi->textfield(-name => 's', -value => $searchtext,
+ -size => 60) . "\n" .
+ "<span title=\"Extended regular expression\">" .
+ $cgi->checkbox(-name => 'sr', -value => 1, -label => 're',
+ -checked => $search_use_regexp) .
+ "</span>\n" .
+ $cgi->submit(-name => 'btnS', -value => 'Search') .
+ $cgi->end_form() . "\n" .
+ $cgi->a({-href => href(project => undef, searchtype => 'list_all')},
+ 'List all projects') . "\n";
+ print "</div>\n" .
+ "<!-- END: project_list search form -->\n\n";
+}
+
# fills project list info (age, description, owner, forks) for each
-# project in the list, removing invalid projects from returned list
+# project in the list, removing invalid projects from returned list,
+# or fill only specified info (removing only when filling 'age')
sub fill_project_list_info {
- my ($projlist, $check_forks) = @_;
+ my ($projlist, $check_forks, $fill_only) = @_;
my @projects;
PROJECT:
foreach my $pr (@$projlist) {
- my (@activity) = git_get_last_activity($pr->{'path'});
- unless (@activity) {
- next PROJECT;
- }
- ($pr->{'age'}, $pr->{'age_string'}) = @activity;
- if (!defined $pr->{'descr'}) {
+ if (!exists $pr->{'age'} &&
+ (!defined $fill_only || $fill_only eq 'age')) {
+ my (@activity) = git_get_last_activity($pr->{'path'});
+ next PROJECT unless (@activity);
+ ($pr->{'age'}, $pr->{'age_string'}) = @activity;
+ }
+ if (!defined $pr->{'descr'} &&
+ (!defined $fill_only || $fill_only eq 'descr')) {
my $descr = git_get_project_description($pr->{'path'}) || "";
$descr = to_utf8($descr);
$pr->{'descr_long'} = $descr;
$pr->{'descr'} = chop_str($descr, $projects_list_description_width, 5);
}
- if (!defined $pr->{'owner'}) {
+ if (!defined $pr->{'owner'} &&
+ (!defined $fill_only || $fill_only eq 'owner')) {
$pr->{'owner'} = git_get_project_owner("$pr->{'path'}") || "";
}
- if ($check_forks) {
+ if ($check_forks &&
+ (!defined $fill_only || $fill_only eq 'forks')) {
my $pname = $pr->{'path'};
if (($pname =~ s/\.git$//) &&
($pname !~ /\/$/) &&
@@ -3627,6 +3659,25 @@ sub fill_project_list_info {
return @projects;
}
+# show only projects which match what we search for
+sub filter_project_list {
+ my ($projlist, $searchtype, $search_regexp) = @_;
+ my %keyname = (
+ 'project' => 'path',
+ 'descr' => 'descr_long',
+ 'owner' => 'owner',
+ );
+ my $key = $keyname{$searchtype};
+
+ # fill in the field we search on
+ @$projlist = fill_project_list_info($projlist, 0, $searchtype);
+ # filter projects list
+ @$projlist = grep
+ { $_->{$key} =~ m/$search_regexp/i; } @$projlist;
+
+ return $projlist;
+}
+
sub print_sort_th {
my ($str_sort, $name, $order, $key, $header, $projlist) = @_;
$key ||= $name;
@@ -3641,7 +3692,7 @@ sub print_sort_th {
print "<th>$header</th>\n";
} else {
print "<th>" .
- $cgi->a({-href => href(project=>undef, order=>$name),
+ $cgi->a({-href => href(-replay => 1, order=>$name),
-class => "header"}, $header) .
"</th>\n";
}
@@ -4005,6 +4056,10 @@ sub git_project_list {
if (defined $order && $order !~ m/none|project|descr|owner|age/) {
die_error(undef, "Unknown order parameter");
}
+ if (defined $searchtype &&
+ $searchtype !~ m/^(project|descr|owner|list_all)$/x) {
+ die_error(undef, "Unknown searchtype parameter");
+ }
my @list = git_get_projects_list();
if (!@list) {
@@ -4019,7 +4074,18 @@ sub git_project_list {
close $fd;
print "</div>\n";
}
- git_project_list_body(\@list, $order);
+ # 'defined $searchtype' serves as "was search performed" test
+ if (@list > 1 || defined $searchtype) {
+ git_project_search_form(\@list, $searchtype,
+ $searchtext, $search_use_regexp);
+ if (defined $searchtype) {
+ filter_project_list(\@list, $searchtype, $search_regexp)
+ unless ($searchtype eq 'list_all');
+ git_project_list_body(\@list, $order);
+ }
+ } else {
+ git_project_list_body(\@list, $order);
+ }
git_footer_html();
}
--
1.5.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC/PATCH] gitweb: Paginate project list
2008-05-09 13:23 ` [RFC/PATCH] gitweb: Project search Jakub Narebski
@ 2008-05-10 9:28 ` Jakub Narebski
2008-05-10 18:28 ` J.H.
0 siblings, 1 reply; 20+ messages in thread
From: Jakub Narebski @ 2008-05-10 9:28 UTC (permalink / raw)
To: git; +Cc: Petr Baudis, J.H.
Divide projects list and project search results into pages of
$page_size (currently 100) elements. Pagination links look
like shown below:
first prev 1 2 3 *4* 5 6 next last
where *4* denotes current page, is in bold, and isn't a link.
This would need to be improved for extremely large number of projects
(repositories), where number of projects divided by page size is very
large.
Currently pagination interacts with sorting in a bit strange way.
When clicking on "sort by" link when on some later page the result is
not this page sorted, by given page of sorted output. Perhaps "sort
by" links should always show first page...
git_project_list_body() subroutine takes care to fill field (key) we
sort by for all projects, sort by given ordering, and fill the rest of
project info only for projects shown.
NOTE: currently code is not optimal, as it contains a few unnecessary
array copying, or array fragment copying.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch depends on earlier "gitweb: Project search", but conceptually
they are independent; probably fill_project_list_info() improvement to
do partial filling should be in seaparet commit...
gitweb/gitweb.perl | 57 ++++++++++++++++++++++++++++++++++++++++++++++-----
1 files changed, 51 insertions(+), 6 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7e1a9b4..716f7ad 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3709,10 +3709,10 @@ sub print_sort_th_num {
sub git_project_list_body {
my ($projlist, $order, $from, $to, $extra, $no_header) = @_;
+ $order ||= $default_projects_order;
my ($check_forks) = gitweb_check_feature('forks');
- my @projects = fill_project_list_info($projlist, $check_forks);
+ my @projects = fill_project_list_info($projlist, $check_forks, $order);
- $order ||= $default_projects_order;
$from = 0 unless defined $from;
$to = $#projects if (!defined $to || $#projects < $to);
@@ -3733,9 +3733,10 @@ sub git_project_list_body {
print "<th></th>\n" . # for links
"</tr>\n";
}
+ @projects = @projects[$from..$to];
+ @projects = fill_project_list_info(\@projects, $check_forks);
my $alternate = 1;
- for (my $i = $from; $i <= $to; $i++) {
- my $pr = $projects[$i];
+ foreach my $pr (@projects) {
if ($alternate) {
print "<tr class=\"dark\">\n";
} else {
@@ -4066,6 +4067,46 @@ sub git_project_list {
die_error(undef, "No projects found");
}
+ my $page_size = 100;
+ # +1 for incomplete page (not having $page_size items)
+ my $npages = int(@list / $page_size)+((@list % $page_size) > 0);
+ # pages are numbered 0..$npages-1
+ $page = 0 if (!defined $page || $page < 0);
+ $page = $npages-1 if ($page >= $npages);
+ my $paging_nav;
+
+ if ($npages > 1) {
+ my @paging = ();
+ push @paging, $cgi->a({-href => href(-replay=>1, page=>0)},
+ "first");
+ if ($page > 0) {
+ push @paging, $cgi->a({-href => href(-replay=>1, page=>$page-1),
+ -accesskey => "p", -title => "Alt-p"},
+ "prev");
+ } else {
+ push @paging, "prev";
+ }
+ for (my $pg = 0; $pg < $npages; $pg++) {
+ # links to individual pages are numbered 1..$npages
+ if ($page == $pg) {
+ push @paging, '<b>'.($pg+1).'</b>';
+ } else {
+ push @paging, $cgi->a({-href => href(-replay=>1, page=>$pg)},
+ $pg+1);
+ }
+ }
+ if ($page < $npages-1) {
+ push @paging, $cgi->a({-href => href(-replay=>1, page=>$page+1),
+ -accesskey => "n", -title => "Alt-n"},
+ "next");
+ } else {
+ push @paging, "next";
+ }
+ push @paging, $cgi->a({-href => href(-replay=>1, page=>$npages-1)},
+ "last");
+ $paging_nav = join(' ⋅ ', @paging);
+ }
+
git_header_html();
if (-f $home_text) {
print "<div class=\"index_include\">\n";
@@ -4081,10 +4122,14 @@ sub git_project_list {
if (defined $searchtype) {
filter_project_list(\@list, $searchtype, $search_regexp)
unless ($searchtype eq 'list_all');
- git_project_list_body(\@list, $order);
+ git_project_list_body(\@list, $order,
+ $page*$page_size, ($page+1)*$page_size - 1,
+ $paging_nav);
}
} else {
- git_project_list_body(\@list, $order);
+ git_project_list_body(\@list, $order,
+ $page*$page_size, ($page+1)*$page_size - 1,
+ $paging_nav);
}
git_footer_html();
}
--
1.5.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC/PATCH] gitweb: Paginate project list
2008-05-10 9:28 ` [RFC/PATCH] gitweb: Paginate project list Jakub Narebski
@ 2008-05-10 18:28 ` J.H.
2008-05-10 22:32 ` Jakub Narebski
0 siblings, 1 reply; 20+ messages in thread
From: J.H. @ 2008-05-10 18:28 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git, Petr Baudis
$page_size should be moved to a configuration directive, with a possible
default of 100, and there should be an option to turn it off completely
(I.E. kernel.org will likely have it turned off).
It might also be advantageous to generate the entire list and stash that
somewhere, (session on the host side?) and run the pagination out of
that. Would even allow the user to choose how many items they wish to
see from a drop down maybe?
- John
On Sat, 2008-05-10 at 11:28 +0200, Jakub Narebski wrote:
> Divide projects list and project search results into pages of
> $page_size (currently 100) elements. Pagination links look
> like shown below:
>
> first prev 1 2 3 *4* 5 6 next last
>
> where *4* denotes current page, is in bold, and isn't a link.
> This would need to be improved for extremely large number of projects
> (repositories), where number of projects divided by page size is very
> large.
>
> Currently pagination interacts with sorting in a bit strange way.
> When clicking on "sort by" link when on some later page the result is
> not this page sorted, by given page of sorted output. Perhaps "sort
> by" links should always show first page...
>
> git_project_list_body() subroutine takes care to fill field (key) we
> sort by for all projects, sort by given ordering, and fill the rest of
> project info only for projects shown.
>
> NOTE: currently code is not optimal, as it contains a few unnecessary
> array copying, or array fragment copying.
>
> Signed-off-by: Jakub Narebski <jnareb@gmail.com>
> ---
> This patch depends on earlier "gitweb: Project search", but conceptually
> they are independent; probably fill_project_list_info() improvement to
> do partial filling should be in seaparet commit...
>
> gitweb/gitweb.perl | 57 ++++++++++++++++++++++++++++++++++++++++++++++-----
> 1 files changed, 51 insertions(+), 6 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 7e1a9b4..716f7ad 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3709,10 +3709,10 @@ sub print_sort_th_num {
> sub git_project_list_body {
> my ($projlist, $order, $from, $to, $extra, $no_header) = @_;
>
> + $order ||= $default_projects_order;
> my ($check_forks) = gitweb_check_feature('forks');
> - my @projects = fill_project_list_info($projlist, $check_forks);
> + my @projects = fill_project_list_info($projlist, $check_forks, $order);
>
> - $order ||= $default_projects_order;
> $from = 0 unless defined $from;
> $to = $#projects if (!defined $to || $#projects < $to);
>
> @@ -3733,9 +3733,10 @@ sub git_project_list_body {
> print "<th></th>\n" . # for links
> "</tr>\n";
> }
> + @projects = @projects[$from..$to];
> + @projects = fill_project_list_info(\@projects, $check_forks);
> my $alternate = 1;
> - for (my $i = $from; $i <= $to; $i++) {
> - my $pr = $projects[$i];
> + foreach my $pr (@projects) {
> if ($alternate) {
> print "<tr class=\"dark\">\n";
> } else {
> @@ -4066,6 +4067,46 @@ sub git_project_list {
> die_error(undef, "No projects found");
> }
>
> + my $page_size = 100;
> + # +1 for incomplete page (not having $page_size items)
> + my $npages = int(@list / $page_size)+((@list % $page_size) > 0);
> + # pages are numbered 0..$npages-1
> + $page = 0 if (!defined $page || $page < 0);
> + $page = $npages-1 if ($page >= $npages);
> + my $paging_nav;
> +
> + if ($npages > 1) {
> + my @paging = ();
> + push @paging, $cgi->a({-href => href(-replay=>1, page=>0)},
> + "first");
> + if ($page > 0) {
> + push @paging, $cgi->a({-href => href(-replay=>1, page=>$page-1),
> + -accesskey => "p", -title => "Alt-p"},
> + "prev");
> + } else {
> + push @paging, "prev";
> + }
> + for (my $pg = 0; $pg < $npages; $pg++) {
> + # links to individual pages are numbered 1..$npages
> + if ($page == $pg) {
> + push @paging, '<b>'.($pg+1).'</b>';
> + } else {
> + push @paging, $cgi->a({-href => href(-replay=>1, page=>$pg)},
> + $pg+1);
> + }
> + }
> + if ($page < $npages-1) {
> + push @paging, $cgi->a({-href => href(-replay=>1, page=>$page+1),
> + -accesskey => "n", -title => "Alt-n"},
> + "next");
> + } else {
> + push @paging, "next";
> + }
> + push @paging, $cgi->a({-href => href(-replay=>1, page=>$npages-1)},
> + "last");
> + $paging_nav = join(' ⋅ ', @paging);
> + }
> +
> git_header_html();
> if (-f $home_text) {
> print "<div class=\"index_include\">\n";
> @@ -4081,10 +4122,14 @@ sub git_project_list {
> if (defined $searchtype) {
> filter_project_list(\@list, $searchtype, $search_regexp)
> unless ($searchtype eq 'list_all');
> - git_project_list_body(\@list, $order);
> + git_project_list_body(\@list, $order,
> + $page*$page_size, ($page+1)*$page_size - 1,
> + $paging_nav);
> }
> } else {
> - git_project_list_body(\@list, $order);
> + git_project_list_body(\@list, $order,
> + $page*$page_size, ($page+1)*$page_size - 1,
> + $paging_nav);
> }
> git_footer_html();
> }
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC/PATCH] gitweb: Paginate project list
2008-05-10 18:28 ` J.H.
@ 2008-05-10 22:32 ` Jakub Narebski
2008-05-11 5:53 ` J.H.
0 siblings, 1 reply; 20+ messages in thread
From: Jakub Narebski @ 2008-05-10 22:32 UTC (permalink / raw)
To: J.H.; +Cc: git, Petr Baudis
On Sun, 10 May 2008, J.H. <warthog19@eaglescrag.net> wrote:
> $page_size should be moved to a configuration directive, with a possible
> default of 100, and there should be an option to turn it off completely
> (I.E. kernel.org will likely have it turned off).
If it would be configurable (and I agree that it is probably a good
idea), it should be per gitweb instance (i.e. global variable), and
should perhaps be named $project_list_page_size to distinguish it from
number of items shown (page size) for 'shortlog', 'log' and 'history'
views.
I guess that value of 0, or perhaps of undef would mean to not divide
projects list, or project search results, into pages.
BTW. why do you think that kernel.org would be better to have project
list pagination turned off?
> It might also be advantageous to generate the entire list and stash that
> somewhere, (session on the host side?) and run the pagination out of
> that.
First, when using $projectslist file with new (second patch in series,
"gitweb: Allow project description in project_index file" most of data
(well, all except age) would be filled by parsing single file.
Second, the idea is to cache results of filled in @$projlist e.g. using
Storable, i.e. cache Perl data and not final HTML output.
> Would even allow the user to choose how many items they wish to
> see from a drop down maybe?
I think it is also a good idea, at least for search results.
NOTE that this was an RFC patch, meant mainly to provide comments on
what to improve...
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC/PATCH] gitweb: Paginate project list
2008-05-10 22:32 ` Jakub Narebski
@ 2008-05-11 5:53 ` J.H.
2008-05-11 23:51 ` Jakub Narebski
[not found] ` <8c5c35580805102356p7e5532aah319af921f9b19392@mail.gmail.com>
0 siblings, 2 replies; 20+ messages in thread
From: J.H. @ 2008-05-11 5:53 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git, Petr Baudis
On Sun, 2008-05-11 at 00:32 +0200, Jakub Narebski wrote:
> On Sun, 10 May 2008, J.H. <warthog19@eaglescrag.net> wrote:
>
> > $page_size should be moved to a configuration directive, with a possible
> > default of 100, and there should be an option to turn it off completely
> > (I.E. kernel.org will likely have it turned off).
>
> If it would be configurable (and I agree that it is probably a good
> idea), it should be per gitweb instance (i.e. global variable), and
> should perhaps be named $project_list_page_size to distinguish it from
> number of items shown (page size) for 'shortlog', 'log' and 'history'
> views.
>
> I guess that value of 0, or perhaps of undef would mean to not divide
> projects list, or project search results, into pages.
>
> BTW. why do you think that kernel.org would be better to have project
> list pagination turned off?
If we have caching on I'd rather leave it as a single page, much easier
to search on and with firefox /<project or name> then <ctrl>+g a couple
of times is really fast to find what I need. Also means less to cache
(good for us) and faster access for most. I'm going to guess leaving it
as a single page, unless it gets *very* unwieldy, will be how we leave
it.
>
> > It might also be advantageous to generate the entire list and stash that
> > somewhere, (session on the host side?) and run the pagination out of
> > that.
>
> First, when using $projectslist file with new (second patch in series,
> "gitweb: Allow project description in project_index file" most of data
> (well, all except age) would be filled by parsing single file.
>
> Second, the idea is to cache results of filled in @$projlist e.g. using
> Storable, i.e. cache Perl data and not final HTML output.
I approve of that plan, caching all the html is kinda expensive *hides
the 20G of gitweb cache he has*
>
> > Would even allow the user to choose how many items they wish to
> > see from a drop down maybe?
>
> I think it is also a good idea, at least for search results.
>
>
> NOTE that this was an RFC patch, meant mainly to provide comments on
> what to improve...
I know, that's why I'm chiming in with my thoughts, I like the idea and
bonus points go to the user configurable display on that. There are
definitely sites that can benefit from this and likely at some point
k.org is going to need have to paginate - just not sure when that point
is going to get hit.
- John 'Warthog9' Hawley
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC/PATCH] gitweb: Paginate project list
2008-05-11 5:53 ` J.H.
@ 2008-05-11 23:51 ` Jakub Narebski
[not found] ` <8c5c35580805102356p7e5532aah319af921f9b19392@mail.gmail.com>
1 sibling, 0 replies; 20+ messages in thread
From: Jakub Narebski @ 2008-05-11 23:51 UTC (permalink / raw)
To: J.H.; +Cc: git, Petr Baudis
On Sun, 11 May 2008, J.H. wrote:
>
> On Sun, 2008-05-11 at 00:32 +0200, Jakub Narebski wrote:
>> On Sun, 10 May 2008, J.H. <warthog19@eaglescrag.net> wrote:
>>
>> BTW. why do you think that kernel.org would be better to have project
>> list pagination turned off?
>
> If we have caching on I'd rather leave it as a single page, much easier
> to search on and with firefox /<project or name> then <ctrl>+g a couple
> of times is really fast to find what I need. Also means less to cache
> (good for us) and faster access for most. I'm going to guess leaving it
> as a single page, unless it gets *very* unwieldy, will be how we leave
> it.
First, you can cache individual pages of projects list output with
small overhead if number of projects per page is large enough. Second,
see my earlier patch adding project search capability to gitweb
(it is server-side search, though). Searching needs caching data, not
output, while pagination only can be done with both.
As to single-page projects list being being unwieldy... You should
take into account how large the page is (having page size in megabytes
isn't too good), how much time it takes to download (and not only
generate), and how much time it takes for web browser to render it.
How many repositories is "too many", that I'm not sure about...
>>> It might also be advantageous to generate the entire list and stash that
>>> somewhere, (session on the host side?) and run the pagination out of
>>> that.
>>
>> First, when using $projectslist file with new (second patch in series,
>> "gitweb: Allow project description in project_index file" most of data
>> (well, all except age) would be filled by parsing single file.
>>
>> Second, the idea is to cache results of filled in @$projlist e.g. using
>> Storable, i.e. cache Perl data and not final HTML output.
>
> I approve of that plan, caching all the html is kinda expensive *hides
> the 20G of gitweb cache he has*
Caching output (HTML) has the advantage of being simple, and avoiding
time to generate page. Caching data has the advantage of taking less
disk space due to being able to reuse the same data for different pages.
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <8c5c35580805102356p7e5532aah319af921f9b19392@mail.gmail.com>]
* Re: [RFC/PATCH] gitweb: Paginate project list
[not found] ` <8c5c35580805102356p7e5532aah319af921f9b19392@mail.gmail.com>
@ 2008-05-12 7:03 ` Jakub Narebski
2008-05-12 15:43 ` Lars Hjemli
0 siblings, 1 reply; 20+ messages in thread
From: Jakub Narebski @ 2008-05-12 7:03 UTC (permalink / raw)
To: Lars Hjemli; +Cc: J.H., git, Petr Baudis
[The original email by Lars didn't get to git mailing list because of
lack of quotes around J.H. in "J.H." <warthog19@eaglescrag.net>
email address in Cc:]
Dnia niedziela 11. maja 2008 08:56, Lars Hjemli napisał:
> On 5/11/08, J.H. <warthog19@eaglescrag.net> wrote:
>> On Sun, 2008-05-11 at 00:32 +0200, Jakub Narebski wrote:
>>>
>>> First, when using $projectslist file with new (second patch in series,
>>> "gitweb: Allow project description in project_index file" most of data
>>> (well, all except age) would be filled by parsing single file.
>>>
>>> Second, the idea is to cache results of filled in @$projlist e.g. using
>>> Storable, i.e. cache Perl data and not final HTML output.
>>
>> I approve of that plan, caching all the html is kinda expensive *hides
>> the 20G of gitweb cache he has*
>
> It seems to me that "projectlist in a single file" and "cache results
> of filled in @$projlist" are different solutions to the same problem:
> rapidly filling a perl datastructure.
Well, yes and no. "Projectlist in single file" is about _static_ data
(which changes only if projects are added, deleted, its description
changed; those are usually rare events), and avoiding mainly I/O and
not CPU (scanning filesystem for repositories, reading config and
description, etc.).
"Cache data" is about caching _variable_ data, such as "Last changed"
information for project. Caching data instead of caching output
(caching HTML) allows to share cache for different presentation of
the very same data (e.g. 'history'/'shortlog' vs 'rss'). And for some
pages, like project search results, caching HTML output doesn't make
much sense, while caching data has it.
> In cgit I've chosen "projectlist in a single file" and "cache html
> output". This makes it cheap (in terms of cpu and io) to both generate
> and serve the cached page (and the cache works for all pages).
As I said, for some pages, like for search results, caching output
doesn't make sense, while caching data has.
> This used to be expensive in terms of cache size (similar to k.orgs
> 20G), but current cgit solves this by treating the cache as a hash
> table; cgitrc has an option to set the cache size (number of files),
> each filename is generated as `hash(url) % cachesize` and each file
> contains the full url (to detect hash collisions) followed by the
> cached content for that url (see
> http://hjemli.net/git/cgit/tree/cache.c for the details).
I guess that is the simplest solution, but I don't think that is
the best solution to have size-limited cache. For example CPAN Perl
module Cache::SizeAwareCache and its derivatives use the following
algorithm
The default cache size limiting algorithm works by removing cache
objects in the following order until the desired limit is reached:
1) objects that have expired
2) objects that are least recently accessed
3) objects that that expire next
BTW. if majority of your clients support transparent compression
(J.H., could you check it for kernel.org; Pasky, could you check
it for repo.or.cz?) then you can reduce cache size by storing pages
compressed.
> Btw: gitweb and cgit seems to aquire the same features these days:
> cgit recently got pagination + search on the project list.
I haven't checked what features cgit has lately...
Gitweb development seems a bit stalled; I got no response to latest
turn od gitweb TODO and wishlist list...
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC/PATCH] gitweb: Paginate project list
2008-05-12 7:03 ` Jakub Narebski
@ 2008-05-12 15:43 ` Lars Hjemli
2008-05-13 6:55 ` Jakub Narebski
0 siblings, 1 reply; 20+ messages in thread
From: Lars Hjemli @ 2008-05-12 15:43 UTC (permalink / raw)
To: Jakub Narebski; +Cc: warthog19, git, Petr Baudis
On 5/12/08, Jakub Narebski <jnareb@gmail.com> wrote:
> [The original email by Lars didn't get to git mailing list because of
> lack of quotes around J.H. in "J.H." <warthog19@eaglescrag.net>
> email address in Cc:]
Gaah, bad gmail...
> Dnia niedziela 11. maja 2008 08:56, Lars Hjemli napisał:
>
> > It seems to me that "projectlist in a single file" and "cache results
> > of filled in @$projlist" are different solutions to the same problem:
> > rapidly filling a perl datastructure.
>
> Well, yes and no. "Projectlist in single file" is about _static_ data
> (which changes only if projects are added, deleted, its description
> changed; those are usually rare events), and avoiding mainly I/O and
> not CPU (scanning filesystem for repositories, reading config and
> description, etc.).
>
> "Cache data" is about caching _variable_ data, such as "Last changed"
> information for project. Caching data instead of caching output
> (caching HTML) allows to share cache for different presentation of
> the very same data (e.g. 'history'/'shortlog' vs 'rss'). And for some
> pages, like project search results, caching HTML output doesn't make
> much sense, while caching data has it.
While I agree that caching search result output almost never makes
sense, I think it's more important that cache hits requires minimal
processing. This is why I've chosen to cache the final result instead
of an intermediate state, but both solutions obviously got some pros
and cons.
> > This used to be expensive in terms of cache size (similar to k.orgs
> > 20G), but current cgit solves this by treating the cache as a hash
> > table; cgitrc has an option to set the cache size (number of files),
> > each filename is generated as `hash(url) % cachesize` and each file
> > contains the full url (to detect hash collisions) followed by the
> > cached content for that url (see
> > http://hjemli.net/git/cgit/tree/cache.c for the details).
>
>
> I guess that is the simplest solution, but I don't think that is
> the best solution to have size-limited cache. For example CPAN Perl
> module Cache::SizeAwareCache and its derivatives use the following
> algorithm
>
> The default cache size limiting algorithm works by removing cache
> objects in the following order until the desired limit is reached:
>
> 1) objects that have expired
> 2) objects that are least recently accessed
> 3) objects that that expire next
Again, minimal processing is the goal of cgits cache implementation,
hence the simple solution.
> > Btw: gitweb and cgit seems to aquire the same features these days:
> > cgit recently got pagination + search on the project list.
>
>
> I haven't checked what features cgit has lately...
>
> Gitweb development seems a bit stalled; I got no response to latest
> turn od gitweb TODO and wishlist list...
Well, I for one found the wishlist interesting; I've been pondering on
implementing a graphic log in cgit (inspired by git-forest and
git-graph), but I refuse to perform a topo-sort ;-)
Hopefully I can exploit the fact that cgit never uses more than one
commit as starting point for log traversal, combined with heuristics
on commit date, to enable a fast graphic log that will be correct for
all but the most pathological cases.
--
larsh
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC/PATCH] gitweb: Paginate project list
2008-05-12 15:43 ` Lars Hjemli
@ 2008-05-13 6:55 ` Jakub Narebski
[not found] ` <8c5c35580805130939m1a1ef8e0yd72402f3c79190ea@mail.gmail.com>
0 siblings, 1 reply; 20+ messages in thread
From: Jakub Narebski @ 2008-05-13 6:55 UTC (permalink / raw)
To: Lars Hjemli; +Cc: git, Petr Baudis, J.H.
Dnia poniedziałek 12. maja 2008 17:43, Lars Hjemli napisał:
> On 5/12/08, Jakub Narebski <jnareb@gmail.com> wrote:
>> Dnia niedziela 11. maja 2008 08:56, Lars Hjemli napisał:
>>
>>> It seems to me that "projectlist in a single file" and "cache results
>>> of filled in @$projlist" are different solutions to the same problem:
>>> rapidly filling a perl datastructure.
>>
>> Well, yes and no. "Projectlist in single file" is about _static_ data
>> (which changes only if projects are added, deleted, its description
>> changed; those are usually rare events), and avoiding mainly I/O and
>> not CPU (scanning filesystem for repositories, reading config and
>> description, etc.).
>>
>> "Cache data" is about caching _variable_ data, such as "Last changed"
>> information for project. Caching data instead of caching output
>> (caching HTML) allows to share cache for different presentation of
>> the very same data (e.g. 'history'/'shortlog' vs 'rss'). And for some
>> pages, like project search results, caching HTML output doesn't make
>> much sense, while caching data has it.
>
> While I agree that caching search result output almost never makes
> sense, I think it's more important that cache hits requires minimal
> processing. This is why I've chosen to cache the final result instead
> of an intermediate state, but both solutions obviously got some pros
> and cons.
True. In most cases caching final output is enough. Only in some
cases caching data is better solution. I hope that "Gitweb caching"
Git's Google Summer of Code 2008 project would examine this in more
detail.
But please take into account that gitweb performance, and I guess any
git web interface performance, is I/O bound and not CPU bound (at least
according to what I remember from J.H. emails). So a little more
processing is I think less important than avoiding hitting the repos.
J.H. (kernel.org) gitweb from what I remember does adaptive caching
of HTML output, while Pasky (repo.or.cz) gitweb does data caching only
for projects list page.
>>> This used to be expensive in terms of cache size (similar to k.orgs
>>> 20G), but current cgit solves this by treating the cache as a hash
>>> table; cgitrc has an option to set the cache size (number of files),
>>> each filename is generated as `hash(url) % cachesize` and each file
>>> contains the full url (to detect hash collisions) followed by the
>>> cached content for that url (see
>>> http://hjemli.net/git/cgit/tree/cache.c for the details).
>>
>>
>> I guess that is the simplest solution, but I don't think that is
>> the best solution to have size-limited cache. For example CPAN Perl
>> module Cache::SizeAwareCache and its derivatives use the following
>> algorithm
>>
>> The default cache size limiting algorithm works by removing cache
>> objects in the following order until the desired limit is reached:
>>
>> 1) objects that have expired
>> 2) objects that are least recently accessed
>> 3) objects that that expire next
>
> Again, minimal processing is the goal of cgits cache implementation,
> hence the simple solution.
I would really like if some comp-sci could calculate amortized cost
of this solution, and what I think is more important, cost of worst
case and what is the probability of hitting worst case or next to
worst case.
By the way, you have to take into account the time it takes to
calculate hash when comparing performance. Note that for LRU cache
you can use heap / priority queue, or splice / self organizing binary
tree.
>>> Btw: gitweb and cgit seems to aquire the same features these days:
>>> cgit recently got pagination + search on the project list.
>>
>> I haven't checked what features cgit has lately...
>>
>> Gitweb development seems a bit stalled; I got no response to latest
>> turn od gitweb TODO and wishlist list...
...so you would have to turn for example to git-php, gitorious and
github for inspiration.
> Well, I for one found the wishlist interesting; I've been pondering on
> implementing a graphic log in cgit (inspired by git-forest and
> git-graph), but I refuse to perform a topo-sort ;-)
>
> Hopefully I can exploit the fact that cgit never uses more than one
> commit as starting point for log traversal, combined with heuristics
> on commit date, to enable a fast graphic log that will be correct for
> all but the most pathological cases.
I think if you wait for graphing API to make it into released version,
you (well, cgit) would be able to use it for "fast graphic log".
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2008-05-14 8:01 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-01 10:20 [RFC/PATCH] gitweb: Simplify git_project_list_body Jakub Narebski
2008-05-02 10:30 ` [RFC/PATCH] gitweb: Allow project description in project_index file Jakub Narebski
2008-05-02 13:04 ` Miklos Vajna
2008-05-03 9:03 ` Jakub Narebski
2008-05-04 2:03 ` Miklos Vajna
2008-05-09 13:23 ` [RFC/PATCH] gitweb: Project search Jakub Narebski
2008-05-10 9:28 ` [RFC/PATCH] gitweb: Paginate project list Jakub Narebski
2008-05-10 18:28 ` J.H.
2008-05-10 22:32 ` Jakub Narebski
2008-05-11 5:53 ` J.H.
2008-05-11 23:51 ` Jakub Narebski
[not found] ` <8c5c35580805102356p7e5532aah319af921f9b19392@mail.gmail.com>
2008-05-12 7:03 ` Jakub Narebski
2008-05-12 15:43 ` Lars Hjemli
2008-05-13 6:55 ` Jakub Narebski
[not found] ` <8c5c35580805130939m1a1ef8e0yd72402f3c79190ea@mail.gmail.com>
2008-05-13 16:46 ` Lars Hjemli
2008-05-13 17:04 ` Jakub Narebski
2008-05-13 19:11 ` Kristian Høgsberg
2008-05-13 19:30 ` Lars Hjemli
2008-05-13 23:28 ` Jakub Narebski
2008-05-14 7:59 ` Jakub Narebski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).