All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Kacper Kornet <draenog@pld-linux.org>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] gitweb: Option to omit column with time of the last change
Date: Wed, 18 Apr 2012 01:36:08 +0200	[thread overview]
Message-ID: <201204180136.08570.jnareb@gmail.com> (raw)
In-Reply-To: <20120416213938.GB22574@camk.edu.pl>

On Mon, 16 Apr 2012, Kacper Kornet wrote:
> On Mon, Apr 16, 2012 at 10:06:49PM +0200, Jakub Narebski wrote:
>> On Mon, 16 Apr 2012, Kacper Kornet wrote:
>>> On Sat, Apr 14, 2012 at 03:16:01PM +0200, Jakub Narebski wrote:

>>>>   That saves I/O, but not fork.
>> 
>> Actually if you look at the footer of projects list page with 'timed'
>> feature enable you see that for N projects on list, gitweb uses 2*N+1
>> git commands.  The "+1" part is from "git version", the "2*N" are from
>> git-for-each-ref to get last activity (and verify repository as a
>> side-effect)...
> 
> It is how I started to think about the problem. With my additional patch
> to remove the owner I am able to reduce the number of git invocations to
> 1.

That is a very good thing.

Especially that adding caching to gitweb is long in coming (to core 
gitweb at least), and that not always one is able to turn on caching.

[...]
>>> My tests show that forks are also a bottleneck in my setup.
>> 
>> What do you mean by "my tests" here?  Is it benchmark (perhaps just using
>> 'timed' feature) with and without custom change removing fork(s)?  Or did
>> you used profiler (e.g. wondefull Devel::NYTProf) for that?
> 
> Nothing fancy. I look at the footnote produced by "timed" feature. And
> I see a difference between version with the following patch:
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 18cdf96..4a13807 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3156,6 +3156,18 @@ sub git_get_project_owner {
>  	return $owner;
>  }
>  
> +sub git_repo_exist {

Perhaps a better name would be validate_headref() or check_head(),
called from subroutine named either is_git_directory() as in setup.c,
or verify_repo()...

> +	my ($path) = @_;

BTW this could be written as

  +	my $path = shift;

though it is largely a matter of taste.

> +       my $fd;
> +
> +       $git_dir = "$projectroot/$path";
> +       open($fd, "<", "$git_dir/HEAD") or return;

It can be written as

  +	open my $fd, "<", "$projectroot/$path/HEAD"
  +		or return;

> +       my $line = <$fd>;
> +       close $fd or return;

Shouldn't we chomp($line)?

> +       return 1 if (defined $fd && substr($line, 0, 10) eq 'ref:refs/' 
> +           || $line=~m/^[0-9a-z]{40}$/);
> +       return 0;

I don't think we need to check that $fd is defined; if it isn't, we would
return earlier I think.

There can be space between "ref:" and fully qualified branch name, and in
fact git puts such space:

  $ cat .git/HEAD 
  ref: refs/heads/gitweb/web

Also, you can return boolean value.

So the above would reduce to:

  +	return ($line =~ m!^ref:\s*refs/! ||
  +	        $line =~ m!^[0-9a-z]{40}$!);

> +}
> +
>  sub git_get_last_activity {
>  	my ($path) = @_;
>  	my $fd;
> @@ -5330,6 +5342,7 @@ sub fill_project_list_info {
>  	my $show_ctags = gitweb_check_feature('ctags');
>   PROJECT:
>  	foreach my $pr (@$projlist) {
> +             next PROJECT unless git_repo_exist($pr->{'path'});

I understand that it is proof of concept patch, but I think that
in real patch iy would be better to update check_export_ok() instead
of this addition.

>  		if (project_info_needs_filling($pr, $filter_set->('age', 'age_string'))) {
>  			my (@activity) = git_get_last_activity($pr->{'path'});
>  			unless (@activity) {
> 
> 
> and the one in which  git_repo_exist() uses invocation to /bin/true:
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 18cdf96..4bcc66f 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3156,6 +3156,13 @@ sub git_get_project_owner {
>  	return $owner;
>  }
>  
> +sub git_repo_exist {
> +	my ($path) = @_;
> +
> +        $git_dir = "$projectroot/$path";
> +        return not system('/bin/true');
> +}
> +

What were the differences in timing?

>>>                                                             On the other 
>>> hand I think that I can trust that my projects.list contains only valid
>>> repositories. So I would prefer to have a don't verify option. Or there
>>> is a possibility to write perl function with the same functionality as
>>> is_git_directory() from setup.c and use it to verify if the directory is a
>>> valid git repo.
>> 
>> Well, we can add those checks to check_export_ok()... well to function
>> it would call.
>> 
>> is_git_repository from setup.c checks that "/objects" and "/refs"
>> have executable permissions, and that "/HEAD" is valid via validate_headref
>> which does slightly more than (now slightly misnamed) check_head_link()
>> from gitweb...
>> 
>> ...or that DB_ENVIRONMENT i.e. GIT_OBJECT_DIRECTORY environment variable
>> is set, and path that it points to has executable permissions.  I don't
>> think we have to worry about this for gitweb.
> 
>> I'll try to come up with a patch to gitweb that improves repository
>> verification, and perhaps a patch that uses the fact that "git config"
>> succeeded to verify repository.
> 
> As you see it is more or less what I have already written for my tests.
> I only don't check if /objects and /refs are directories. If you want I
> can send proper patch submission for this function

I don't know how strict we should be, but "/objects" and "/refs" are just
one stat more.


Anyway, if you plan on resending this patch series, then "gitweb: Improve
repository verification" should be be first, I think.

-- 
Jakub Narebski
Poland

  reply	other threads:[~2012-04-17 23:35 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-03 13:27 [PATCH] gitweb: Option to omit column with time of the last change Kacper Kornet
2012-04-03 23:12 ` Jakub Narebski
2012-04-04  6:39   ` Kacper Kornet
2012-04-04 14:31     ` Jakub Narebski
2012-04-04 16:22       ` Kacper Kornet
2012-04-14 13:16         ` Jakub Narebski
2012-04-16 10:12           ` Kacper Kornet
2012-04-16 20:06             ` Jakub Narebski
2012-04-16 21:39               ` Kacper Kornet
2012-04-17 23:36                 ` Jakub Narebski [this message]
2012-04-19 16:07                   ` [PATCH] gitweb: Improve repository verification Jakub Narebski
2012-04-19 18:30                     ` Junio C Hamano
2012-04-19 19:46                       ` Jakub Narebski
2012-04-21 11:28                         ` Jakub Narebski
2012-04-24 17:39                     ` [PATCH 1/2] gitweb: Option to omit column with time of the last change Kacper Kornet
2012-04-24 17:41                     ` [PATCH 2/2] gitweb: Option to not display information about owner Kacper Kornet
2012-04-26  4:39                       ` Junio C Hamano
2012-04-26 15:07                         ` Kacper Kornet
2012-04-26 15:53                           ` Junio C Hamano
2012-04-26 16:35                             ` Kacper Kornet
2012-04-26 16:45                               ` [PATCH v2 " Kacper Kornet
2012-04-24 17:36                   ` [PATCH] gitweb: Option to omit column with time of the last change Kacper Kornet
2012-04-04 17:14       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201204180136.08570.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=draenog@pld-linux.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.