* [PATCH] gitweb: Use File::Find::find in git_get_projects_list
@ 2006-09-14 6:39 Jakub Narebski
2006-09-14 7:37 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Jakub Narebski @ 2006-09-14 6:39 UTC (permalink / raw)
To: git
Earlier code to get list of projects when $projects_list is a
directory (e.g. when it is equal to $projectroot) had a hardcoded flat
(one level) list of directories. Allow for projects to be in
subdirectories also for $projects_list being a directory by using
File::Find.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This doesn't add much overhead to "project_list" view,
compared to previous version; times are the same within margin
of error.
gitweb/gitweb.perl | 29 +++++++++++++++++++++--------
1 files changed, 21 insertions(+), 8 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c3544dd..470bff2 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -715,16 +715,29 @@ sub git_get_projects_list {
if (-d $projects_list) {
# search in directory
my $dir = $projects_list;
- opendir my ($dh), $dir or return undef;
- while (my $dir = readdir($dh)) {
- if (-e "$projectroot/$dir/HEAD") {
- my $pr = {
- path => $dir,
- };
- push @list, $pr
+ my $pfxlen = length("$dir");
+
+ sub wanted {
+ # skip dot files (hidden files), check only directories
+ #return if (/^\./);
+ return unless (-d $_);
+
+ my $subdir = substr($File::Find::name, $pfxlen + 1);
+ # we check related file in $projectroot
+ if (-e "$projectroot/$subdir/HEAD") {
+ push @list, { path => $subdir };
+ $File::Find::prune = 1;
}
}
- closedir($dh);
+
+ File::Find::find({
+ no_chdir => 1, # do not change directory
+ follow_fast => 1, # follow symbolic links
+ #follow_skip => 2, # ignore duplicated files and directories
+ dangling_symlinks => 0, # ignore dangling symlinks, silently
+ wanted => \&wanted,
+ }, "$dir");
+
} elsif (-f $projects_list) {
# read from file(url-encoded):
# 'git%2Fgit.git Linus+Torvalds'
--
1.4.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 6:39 [PATCH] gitweb: Use File::Find::find in git_get_projects_list Jakub Narebski
@ 2006-09-14 7:37 ` Junio C Hamano
2006-09-14 7:59 ` Jakub Narebski
2006-09-14 17:39 ` [PATCH (amend)] " Jakub Narebski
2006-09-14 19:34 ` [PATCH (take 3)] " Jakub Narebski
2 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2006-09-14 7:37 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Jakub Narebski <jnareb@gmail.com> writes:
> + sub wanted {
> + # skip dot files (hidden files), check only directories
> + #return if (/^\./);
Leftover comment?
> + my $subdir = substr($File::Find::name, $pfxlen + 1);
> + # we check related file in $projectroot
> + if (-e "$projectroot/$subdir/HEAD") {
> + push @list, { path => $subdir };
> + $File::Find::prune = 1;
We might want to do an extra cheap check to make what we found
is sane, to prevent us getting confused by a random file whose
name happens to be HEAD.
For example, it is a regular file whose contents is a single
line and begins with "ref: refs/heads/" (16 bytes) or it is a
symlink and readlink result begins with "refs/heads/" (11
bytes).
If you feel opening and reading the file is an added overhead,
checking for $project/$subdir/{objects,refs}/ directories might
be a good alternative.
> + File::Find::find({
> + no_chdir => 1, # do not change directory
> + follow_fast => 1, # follow symbolic links
What is the reason behind choosing follow_fast? By saying
follow_anything, you choose to care about cases where there are
symlinks under projectroot to point at various projects. If
that is the case, don't you want to make sure you include the
same project only once?
> + #follow_skip => 2, # ignore duplicated files and directories
Leftover comment?
About these two leftover comments, if you decided you did not
want them, please do not leave them behind.
If on the other hand you wanted to hint others that you are not
sure about your decision, it would probably be better to say
that honestly in the comment, perhaps mark the message as RFC,
and describe what the issues are, like so:
sub wanted {
# We might want to also ignore dot files, by
# saying "return if /^\./;" here, but there is
# no inherent reason for us to forbid a repository
# name from starting with a dot.
# We check only if a directory looks like a git
# repo and do not care about non directories.
# Note that this cannot be done with "-d _"
# because we are using follow_fast and the last
# stat was done with lstat(); we want to catch a
# symlink that points at a directory.
return unless -d $_;
...
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 7:37 ` Junio C Hamano
@ 2006-09-14 7:59 ` Jakub Narebski
2006-09-14 9:23 ` Junio C Hamano
2006-09-14 9:40 ` Junio C Hamano
0 siblings, 2 replies; 19+ messages in thread
From: Jakub Narebski @ 2006-09-14 7:59 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano
Dnia czwartek 14. września 2006 09:37, napisałeś:
> Jakub Narebski <jnareb@gmail.com> writes:
>
> > + sub wanted {
> > + # skip dot files (hidden files), check only directories
> > + #return if (/^\./);
>
> Leftover comment?
Leftover comment, from copying anonymous 'wanted' subroutine
from git_get_refs_list. But I have realized that for gitweb
for only one project one could have ".git" as a project name,
e.g. by putting $projectroot to be live git repository (working
directory of git repository).
> > + my $subdir = substr($File::Find::name, $pfxlen + 1);
> > + # we check related file in $projectroot
> > + if (-e "$projectroot/$subdir/HEAD") {
> > + push @list, { path => $subdir };
> > + $File::Find::prune = 1;
>
> We might want to do an extra cheap check to make what we found
> is sane, to prevent us getting confused by a random file whose
> name happens to be HEAD.
That is what we did before. Simplest check, also to avoid now to
claim top directory as git repository, and to know when to cut-off
(prune) finding.
It was intended I think to avoid adding '.' and '..' as git
repositories, not stray directories. Well, perhaps index file
if it was used.
> For example, it is a regular file whose contents is a single
> line and begins with "ref: refs/heads/" (16 bytes) or it is a
> symlink and readlink result begins with "refs/heads/" (11
> bytes).
We can do that, but I think it is unnecessary. Let's assume that
$projectroot contains _only_ git repositories, perhaps in subdirs
(directory hierarchy), and perhaps some stray files like not used
now index file.
> If you feel opening and reading the file is an added overhead,
> checking for $project/$subdir/{objects,refs}/ directories might
> be a good alternative.
Probably overkill.
> > + File::Find::find({
> > + no_chdir => 1, # do not change directory
> > + follow_fast => 1, # follow symbolic links
>
> What is the reason behind choosing follow_fast? By saying
> follow_anything, you choose to care about cases where there are
> symlinks under projectroot to point at various projects. If
> that is the case, don't you want to make sure you include the
> same project only once?
First, it is faster. Second, for testing if it works I used copy
of a one "live" git repository I have (git.git repository), by making
second symlink to it.
> > + #follow_skip => 2, # ignore duplicated files and directories
>
> Leftover comment?
Leftover from benchmarking what set of options is faster.
By the way, if we choose to use 'follow' instead of 'follow_fast' we
might want to uncomment it, to not spew errors in the log.
> About these two leftover comments, if you decided you did not
> want them, please do not leave them behind.
O.K.
> If on the other hand you wanted to hint others that you are not
> sure about your decision, it would probably be better to say
> that honestly in the comment, perhaps mark the message as RFC,
> and describe what the issues are, like so:
>
> sub wanted {
> # We might want to also ignore dot files, by
> # saying "return if /^\./;" here, but there is
> # no inherent reason for us to forbid a repository
> # name from starting with a dot.
True.
> # We check only if a directory looks like a git
> # repo and do not care about non directories.
> # Note that this cannot be done with "-d _"
> # because we are using follow_fast and the last
> # stat was done with lstat(); we want to catch a
> # symlink that points at a directory.
> return unless -d $_;
> ...
Not true. Link to directory is both -d $_ and -l $_, so
return unless (-d $_ || (-l $_ && -d readlink($_)));
is not needed.
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 7:59 ` Jakub Narebski
@ 2006-09-14 9:23 ` Junio C Hamano
2006-09-14 18:32 ` Junio C Hamano
2006-09-14 9:40 ` Junio C Hamano
1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2006-09-14 9:23 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Jakub Narebski <jnareb@gmail.com> writes:
> Not true. Link to directory is both -d $_ and -l $_, so
>
> return unless (-d $_ || (-l $_ && -d readlink($_)));
>
> is not needed.
I think you mis-read what I said. I first wondered why you did
not say "return unless -d _" and wrote (seemingly more
inefficient) "return unless -d $_". The comment is to clarify
why '$' is needed.
In other words, after this setup:
$ rm -fr d dl
$ mkdir d
$ ln -s d dl
you do not see an output from this:
$ perl -e 'lstat "dl"; print "is-dir\n" if -d _;'
but you do from this:
$ perl -e 'lstat "dl"; print "is-dir\n" if -d "dl";'
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 7:59 ` Jakub Narebski
2006-09-14 9:23 ` Junio C Hamano
@ 2006-09-14 9:40 ` Junio C Hamano
2006-09-14 10:01 ` Jakub Narebski
1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2006-09-14 9:40 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Jakub Narebski <jnareb@gmail.com> writes:
> ... But I have realized that for gitweb
> for only one project one could have ".git" as a project name,
> e.g. by putting $projectroot to be live git repository (working
> directory of git repository).
I do not understand this comment. I have /git/{git,linux-2.6}
and run test gitweb with projectroot set to /git to see it can
display them both. Both are with working tree, so there are
/git/git/.git and /git/linux-2.6/.git directories.
Do you mean to say "project name" is always taken from the last
component of the directory name and the above setup would result
in ambiguity? If that is the case that sounds awful, but I
haven't noticed it.
> It was intended I think to avoid adding '.' and '..' as git
> repositories, not stray directories. Well, perhaps index file
> if it was used.
Having and not having index are both valid, so there is no value
in checking the index, even if we wanted to be more paranoid.
Existence of HEAD, refs/heads, and/or objects/ would be a more
meaningful alternative, but again only if we wanted to be more
paranoid.
>> > + File::Find::find({
>> > + no_chdir => 1, # do not change directory
>> > + follow_fast => 1, # follow symbolic links
>>
>> What is the reason behind choosing follow_fast? By saying
>> follow_anything, you choose to care about cases where there are
>> symlinks under projectroot to point at various projects. If
>> that is the case, don't you want to make sure you include the
>> same project only once?
>
> First, it is faster. Second, for testing if it works I used copy
> of a one "live" git repository I have (git.git repository), by making
> second symlink to it.
That was not what I wanted to ask; slower and correct is always
preferred over fast and incorrect. I did not see anything that
compensates the duplicates follow_fast might give you in the
code, so I wondered there were some other trick you used to avoid
it.
In other words, "because I have such and such check to avoid
duplicates, so I can safely use 'follow_fast', without using
slower 'follow'" was the answer I was after.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 9:40 ` Junio C Hamano
@ 2006-09-14 10:01 ` Jakub Narebski
2006-09-14 16:42 ` Junio C Hamano
0 siblings, 1 reply; 19+ messages in thread
From: Jakub Narebski @ 2006-09-14 10:01 UTC (permalink / raw)
To: Junio C Hamano, git
Dnia czwartek 14. września 2006 11:40, napisałeś:
> Jakub Narebski <jnareb@gmail.com> writes:
>
> > ... But I have realized that for gitweb
> > for only one project one could have ".git" as a project name,
> > e.g. by putting $projectroot to be live git repository (working
> > directory of git repository).
>
> I do not understand this comment. I have /git/{git,linux-2.6}
> and run test gitweb with projectroot set to /git to see it can
> display them both. Both are with working tree, so there are
> /git/git/.git and /git/linux-2.6/.git directories.
>
> Do you mean to say "project name" is always taken from the last
> component of the directory name and the above setup would result
> in ambiguity? If that is the case that sounds awful, but I
> haven't noticed it.
Sorry, this comment was leftover from before, when no_chdir was not
used. Then $_ was the last part of directory, and 'return if (/^\./);'
would not take into consideration git/.git nor linux-2.6/.git as
valid project names ($projectroot / $projects_list is stripped from
project name).
Now the test should read "if (m!^.*/\.!)" if we want to skip
dot files and dot directories.
> > It was intended I think to avoid adding '.' and '..' as git
> > repositories, not stray directories. Well, perhaps index file
> > if it was used.
>
> Having and not having index are both valid, so there is no value
> in checking the index, even if we wanted to be more paranoid.
> Existence of HEAD, refs/heads, and/or objects/ would be a more
> meaningful alternative, but again only if we wanted to be more
> paranoid.
Sorry for the confusion. By index I meant alternate way of specyfying
projects, and up till now the only way to specify hierarchical
repository structure, namely $projects_list being file with encoded
directories and encoded owners (it was named index.aux or
index/index.aux by default).
> >> > + File::Find::find({
> >> > + no_chdir => 1, # do not change directory
> >> > + follow_fast => 1, # follow symbolic links
> >>
> >> What is the reason behind choosing follow_fast? By saying
> >> follow_anything, you choose to care about cases where there are
> >> symlinks under projectroot to point at various projects. If
> >> that is the case, don't you want to make sure you include the
> >> same project only once?
> >
> > First, it is faster. Second, for testing if it works I used copy
> > of a one "live" git repository I have (git.git repository), by
> > making second symlink to it.
>
> That was not what I wanted to ask; slower and correct is always
> preferred over fast and incorrect. I did not see anything that
> compensates the duplicates follow_fast might give you in the
> code, so I wondered there were some other trick you used to avoid
> it.
>
> In other words, "because I have such and such check to avoid
> duplicates, so I can safely use 'follow_fast', without using
> slower 'follow'" was the answer I was after.
First of all, why duplicated repositories are considered error?
It is perhaps error/mistake (besides testing of course) in doing the
layout of repositories (projectroot), but it is not that something bad
would happen if there are duplicated repositories.
Second, follow_fast is faster, especially that in gitweb we would
have to set "follow_skip => 2" to not spew errors to web server log when
there are duplicated repositories.
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 10:01 ` Jakub Narebski
@ 2006-09-14 16:42 ` Junio C Hamano
2006-09-14 17:02 ` Jakub Narebski
0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2006-09-14 16:42 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Jakub Narebski <jnareb@gmail.com> writes:
> Sorry, this comment was leftover from before, when no_chdir was not
> used. Then $_ was the last part of directory,...
Ah, thanks. I missed that difference. Did you choose to use
no_chdir for performance reasons or coding convenience (somehow
I had an impression that no_chdir would be slower)?
> Sorry for the confusion. By index I meant alternate way of specyfying
> projects,...
Again, thanks for clarification. I forgot about the handcrafted
project index file gitweb uses.
> First of all, why duplicated repositories are considered
> error? ... it is not that something bad would happen if there
> are duplicated repositories.
Fair enough, and I agree that that is a sane reasoning. Thanks.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 16:42 ` Junio C Hamano
@ 2006-09-14 17:02 ` Jakub Narebski
2006-09-14 17:12 ` Randal L. Schwartz
0 siblings, 1 reply; 19+ messages in thread
From: Jakub Narebski @ 2006-09-14 17:02 UTC (permalink / raw)
To: git
Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
>
>> Sorry, this comment was leftover from before, when no_chdir was not
>> used. Then $_ was the last part of directory,...
>
> Ah, thanks. I missed that difference. Did you choose to use
> no_chdir for performance reasons or coding convenience (somehow
> I had an impression that no_chdir would be slower)?
First benchmarks showed that no_chdir was some faster. I have rechecked,
and they are the same within the margin of error, perhaps without
no_chdir is slightly faster. 447 +/- 11 ms vs. 450 +/- 10 ms according
to ApacheBench (ab -n 10).
So it is really the matter of convenience. The default (without no_chdir) is
I think more convenient.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 17:02 ` Jakub Narebski
@ 2006-09-14 17:12 ` Randal L. Schwartz
0 siblings, 0 replies; 19+ messages in thread
From: Randal L. Schwartz @ 2006-09-14 17:12 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
>>>>> "Jakub" == Jakub Narebski <jnareb@gmail.com> writes:
Jakub> First benchmarks showed that no_chdir was some faster. I have rechecked,
Jakub> and they are the same within the margin of error, perhaps without
Jakub> no_chdir is slightly faster. 447 +/- 11 ms vs. 450 +/- 10 ms according
Jakub> to ApacheBench (ab -n 10).
Any benchmarks will certainly depend on the O/S as well. The advantage to
chdir is that the name-to-inode lookups don't need to keep retraversing the
upper directories. And keep in mind that caching will affect the benchmarks
on that.
As for that whole -d thing...
If you use _, you have to keep in mind whether the previous call was a stat()
or an lstat(). Both lstat() (explictly) and -l use lstat(). Everything else
is a stat(). So, "-l and not -d" uses an lstat to determine that we're
looking at a symlink, but a stat to determine that the item pointed at is not
a directory. NEVER use readlink(), as it very likely will require a lot of
flattening for you to simulate what the OS does. (See my article on that at
<http://www.stonehenge.com/merlyn/UnixReview/col27.html>.)
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH (amend)] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 6:39 [PATCH] gitweb: Use File::Find::find in git_get_projects_list Jakub Narebski
2006-09-14 7:37 ` Junio C Hamano
@ 2006-09-14 17:39 ` Jakub Narebski
2006-09-14 19:14 ` Jakub Narebski
2006-09-14 21:50 ` Randal L. Schwartz
2006-09-14 19:34 ` [PATCH (take 3)] " Jakub Narebski
2 siblings, 2 replies; 19+ messages in thread
From: Jakub Narebski @ 2006-09-14 17:39 UTC (permalink / raw)
To: git; +Cc: Junio Hamano
Earlier code to get list of projects when $projects_list is a
directory (e.g. when it is equal to $projectroot) had a hardcoded flat
(one level) list of directories. Allow for projects to be in
subdirectories also for $projects_list being a directory by using
File::Find.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
gitweb/gitweb.perl | 26 ++++++++++++++++++--------
1 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c3544dd..27641a6 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -715,16 +715,26 @@ sub git_get_projects_list {
if (-d $projects_list) {
# search in directory
my $dir = $projects_list;
- opendir my ($dh), $dir or return undef;
- while (my $dir = readdir($dh)) {
- if (-e "$projectroot/$dir/HEAD") {
- my $pr = {
- path => $dir,
- };
- push @list, $pr
+ my $pfxlen = length("$dir");
+
+ sub wanted {
+ # only directories can be git repositories
+ return unless (-d $_);
+
+ my $subdir = substr($File::Find::name, $pfxlen + 1);
+ # we check related file in $projectroot
+ if (-e "$projectroot/$subdir/HEAD") {
+ push @list, { path => $subdir };
+ $File::Find::prune = 1;
}
}
- closedir($dh);
+
+ File::Find::find({
+ follow_fast => 1, # follow symbolic links
+ dangling_symlinks => 0, # ignore dangling symlinks, silently
+ wanted => \&wanted,
+ }, "$dir");
+
} elsif (-f $projects_list) {
# read from file(url-encoded):
# 'git%2Fgit.git Linus+Torvalds'
--
1.4.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 9:23 ` Junio C Hamano
@ 2006-09-14 18:32 ` Junio C Hamano
2006-09-14 18:43 ` Jakub Narebski
0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2006-09-14 18:32 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Junio C Hamano <junkio@cox.net> writes:
> Jakub Narebski <jnareb@gmail.com> writes:
>
>> Not true. Link to directory is both -d $_ and -l $_, so
>>
>> return unless (-d $_ || (-l $_ && -d readlink($_)));
>>
>> is not needed.
>
> I think you mis-read what I said. I first wondered why you did
> not say "return unless -d _" and wrote (seemingly more
> inefficient) "return unless -d $_". The comment is to clarify
> why '$' is needed.
>
> In other words, after this setup:
>
> $ rm -fr d dl
> $ mkdir d
> $ ln -s d dl
>
> you do not see an output from this:
>
> $ perl -e 'lstat "dl"; print "is-dir\n" if -d _;'
>
> but you do from this:
>
> $ perl -e 'lstat "dl"; print "is-dir\n" if -d "dl";'
Side note:
While
return unless -d $_
there is definitely more correct than "return unless -d _" which
is not, it is not the most efficient. Because you use fast_xxx,
you know the last stat was lstat so "-d _" would be true if the
thing you are looking at is a real directory and will be a
zero-cost operation. The only case you want to be careful is a
symlink pointing at a directory, so
return unless ((-d _) || (-l _ && -d $_))
would be more efficient.
I have a strange suspicion that Merlyn will soon join us with
more expertise if we keep talking about Perl ;-)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 18:32 ` Junio C Hamano
@ 2006-09-14 18:43 ` Jakub Narebski
0 siblings, 0 replies; 19+ messages in thread
From: Jakub Narebski @ 2006-09-14 18:43 UTC (permalink / raw)
To: git
Junio C Hamano wrote:
> Side note:
>
> While
>
> return unless -d $_
>
> there is definitely more correct than "return unless -d _" which
> is not, it is not the most efficient. Because you use fast_xxx,
> you know the last stat was lstat so "-d _" would be true if the
> thing you are looking at is a real directory and will be a
> zero-cost operation. The only case you want to be careful is a
> symlink pointing at a directory, so
>
> return unless ((-d _) || (-l _ && -d $_))
>
> would be more efficient.
>
> I have a strange suspicion that Merlyn will soon join us with
> more expertise if we keep talking about Perl ;-)
Truth to be told, I didn't know about '-d _', and File::Find(3pm)
does talk only about $_.
Besides, I guess that the possible speedup is negligible, and of
course depend if for example the whole $projectroot aka. $projects_list
is for example populated by symlinks. Then it would be slower.
This is the place where more readable should win.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH (amend)] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 17:39 ` [PATCH (amend)] " Jakub Narebski
@ 2006-09-14 19:14 ` Jakub Narebski
2006-09-14 19:51 ` Junio C Hamano
2006-09-14 21:50 ` Randal L. Schwartz
1 sibling, 1 reply; 19+ messages in thread
From: Jakub Narebski @ 2006-09-14 19:14 UTC (permalink / raw)
To: git
Gaah, the code spews quite a lot of warnings as of now.
Variable "$pfxlen" will not stay shared at gitweb.perl line 724.
Variable "@list" will not stay shared at gitweb.perl line 727.
...
--
Jakub Narebski
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH (take 3)] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 6:39 [PATCH] gitweb: Use File::Find::find in git_get_projects_list Jakub Narebski
2006-09-14 7:37 ` Junio C Hamano
2006-09-14 17:39 ` [PATCH (amend)] " Jakub Narebski
@ 2006-09-14 19:34 ` Jakub Narebski
2006-09-14 20:13 ` Junio C Hamano
2 siblings, 1 reply; 19+ messages in thread
From: Jakub Narebski @ 2006-09-14 19:34 UTC (permalink / raw)
To: git; +Cc: Junio Hamano
Earlier code to get list of projects when $projects_list is a
directory (e.g. when it is equal to $projectroot) had a hardcoded flat
(one level) list of directories. Allow for projects to be in
subdirectories also for $projects_list being a directory by using
File::Find.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Use anonymous subroutine to avoid
Variable "@list" will not stay shared at gitweb.perl line 727.
warning. Check for the current directory to avoid substr outside
string warning.
gitweb/gitweb.perl | 30 ++++++++++++++++++++----------
1 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c3544dd..bea75d3 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -715,16 +715,26 @@ sub git_get_projects_list {
if (-d $projects_list) {
# search in directory
my $dir = $projects_list;
- opendir my ($dh), $dir or return undef;
- while (my $dir = readdir($dh)) {
- if (-e "$projectroot/$dir/HEAD") {
- my $pr = {
- path => $dir,
- };
- push @list, $pr
- }
- }
- closedir($dh);
+ my $pfxlen = length("$dir");
+
+ File::Find::find({
+ follow_fast => 1, # follow symbolic links
+ dangling_symlinks => 0, # ignore dangling symlinks, silently
+ wanted => sub {
+ # skip current directory
+ return if (m!^/|.|..$!);
+ # only directories can be git repositories
+ return unless (-d $_);
+
+ my $subdir = substr($File::Find::name, $pfxlen + 1);
+ # we check related file in $projectroot
+ if (-e "$projectroot/$subdir/HEAD") {
+ push @list, { path => $subdir };
+ $File::Find::prune = 1;
+ }
+ },
+ }, "$dir");
+
} elsif (-f $projects_list) {
# read from file(url-encoded):
# 'git%2Fgit.git Linus+Torvalds'
--
1.4.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH (amend)] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 19:14 ` Jakub Narebski
@ 2006-09-14 19:51 ` Junio C Hamano
0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2006-09-14 19:51 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Jakub Narebski <jnareb@gmail.com> writes:
> Gaah, the code spews quite a lot of warnings as of now.
>
> Variable "$pfxlen" will not stay shared at gitweb.perl line 724.
> Variable "@list" will not stay shared at gitweb.perl line 727.
> ...
Thanks for an early warning. Will not apply that one and
instead will wait for updates.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH (take 3)] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 19:34 ` [PATCH (take 3)] " Jakub Narebski
@ 2006-09-14 20:13 ` Junio C Hamano
2006-09-14 20:18 ` [PATCH (take 4)] " Jakub Narebski
0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2006-09-14 20:13 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Jakub Narebski <jnareb@gmail.com> writes:
> + wanted => sub {
> + # skip current directory
> + return if (m!^/|.|..$!);
Huh?
(1) Did you mean to say "\." (not any single character but
literally dot)?
(2) how does the alternatives within m{} construct bind (iow,
please be gentle to the readers)? Do you mean
return if (/^\/(?:\.|\.\.)$/)
in other words,
return if (/^\/\.$/ || /^\/\.\.$/)
in other words,
return if (/^\/\.{1,2}/)
???
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH (take 4)] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 20:13 ` Junio C Hamano
@ 2006-09-14 20:18 ` Jakub Narebski
2006-09-14 20:24 ` Jakub Narebski
0 siblings, 1 reply; 19+ messages in thread
From: Jakub Narebski @ 2006-09-14 20:18 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano
Earlier code to get list of projects when $projects_list is a
directory (e.g. when it is equal to $projectroot) had a hardcoded flat
(one level) list of directories. Allow for projects to be in
subdirectories also for $projects_list being a directory by using
File::Find.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
gitweb/gitweb.perl | 30 ++++++++++++++++++++----------
1 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c3544dd..25383bc 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -715,16 +715,26 @@ sub git_get_projects_list {
if (-d $projects_list) {
# search in directory
my $dir = $projects_list;
- opendir my ($dh), $dir or return undef;
- while (my $dir = readdir($dh)) {
- if (-e "$projectroot/$dir/HEAD") {
- my $pr = {
- path => $dir,
- };
- push @list, $pr
- }
- }
- closedir($dh);
+ my $pfxlen = length("$dir");
+
+ File::Find::find({
+ follow_fast => 1, # follow symbolic links
+ dangling_symlinks => 0, # ignore dangling symlinks, silently
+ wanted => sub {
+ # skip current directory
+ return if (m!^[/.]$!);
+ # only directories can be git repositories
+ return unless (-d $_);
+
+ my $subdir = substr($File::Find::name, $pfxlen + 1);
+ # we check related file in $projectroot
+ if (-e "$projectroot/$subdir/HEAD") {
+ push @list, { path => $subdir };
+ $File::Find::prune = 1;
+ }
+ },
+ }, "$dir");
+
} elsif (-f $projects_list) {
# read from file(url-encoded):
# 'git%2Fgit.git Linus+Torvalds'
--
1.4.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH (take 4)] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 20:18 ` [PATCH (take 4)] " Jakub Narebski
@ 2006-09-14 20:24 ` Jakub Narebski
0 siblings, 0 replies; 19+ messages in thread
From: Jakub Narebski @ 2006-09-14 20:24 UTC (permalink / raw)
To: git
Jakub Narebski wrote:
> + # skip current directory
Perhaps it would be better to say "skip $projects_list (top) directory".
I don't think it is important enough to resend a patch.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH (amend)] gitweb: Use File::Find::find in git_get_projects_list
2006-09-14 17:39 ` [PATCH (amend)] " Jakub Narebski
2006-09-14 19:14 ` Jakub Narebski
@ 2006-09-14 21:50 ` Randal L. Schwartz
1 sibling, 0 replies; 19+ messages in thread
From: Randal L. Schwartz @ 2006-09-14 21:50 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git, Junio Hamano
>>>>> "Jakub" == Jakub Narebski <jnareb@gmail.com> writes:
Jakub> + sub wanted {
Jakub> + # only directories can be git repositories
Jakub> + return unless (-d $_);
Jakub> +
Jakub> + my $subdir = substr($File::Find::name, $pfxlen + 1);
Jakub> + # we check related file in $projectroot
Jakub> + if (-e "$projectroot/$subdir/HEAD") {
Jakub> + push @list, { path => $subdir };
Jakub> + $File::Find::prune = 1;
Jakub> }
Jakub> }
Jakub> - closedir($dh);
Jakub> +
Jakub> + File::Find::find({
Jakub> + follow_fast => 1, # follow symbolic links
Jakub> + dangling_symlinks => 0, # ignore dangling symlinks, silently
Jakub> + wanted => \&wanted,
Jakub> + }, "$dir");
Don't define a sub inside a sub. That's the "won't stay shared"
problem.
Move that like this:
find( {
follow_fast => 1,
dangling_symlinks => 0,
wanted => sub {
... everything in &wanted above ...
}, $dir);
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2006-09-14 21:50 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-14 6:39 [PATCH] gitweb: Use File::Find::find in git_get_projects_list Jakub Narebski
2006-09-14 7:37 ` Junio C Hamano
2006-09-14 7:59 ` Jakub Narebski
2006-09-14 9:23 ` Junio C Hamano
2006-09-14 18:32 ` Junio C Hamano
2006-09-14 18:43 ` Jakub Narebski
2006-09-14 9:40 ` Junio C Hamano
2006-09-14 10:01 ` Jakub Narebski
2006-09-14 16:42 ` Junio C Hamano
2006-09-14 17:02 ` Jakub Narebski
2006-09-14 17:12 ` Randal L. Schwartz
2006-09-14 17:39 ` [PATCH (amend)] " Jakub Narebski
2006-09-14 19:14 ` Jakub Narebski
2006-09-14 19:51 ` Junio C Hamano
2006-09-14 21:50 ` Randal L. Schwartz
2006-09-14 19:34 ` [PATCH (take 3)] " Jakub Narebski
2006-09-14 20:13 ` Junio C Hamano
2006-09-14 20:18 ` [PATCH (take 4)] " Jakub Narebski
2006-09-14 20:24 ` Jakub Narebski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).