Git development
 help / color / mirror / Atom feed
* Re: svn to git, N-squared?
From: Jon Smirl @ 2006-06-12 16:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux@horizon.com, git
In-Reply-To: <Pine.LNX.4.64.0606120906210.5498@g5.osdl.org>

On 6/12/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Mon, 12 Jun 2006, Jon Smirl wrote:
> >
> >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > 14525 jonsmirl  16   0  604m 391m 1904 S   24 38.7 916:53.39 git-svnimport
> > 20947 jonsmirl  17   0     0    0    0 R    1  0.0   0:00.03 git-svnimport
>
> Hard to tell, it's obviously got short-lived processes there too that it's
> not showing, but equally obviously that svnimport script itself is
> spending an alarming amount of CPU time. I don't think it should do that
> much processing, but since it's written in perl, I can't read it.
>
> Are there any other directories that seem to be growing (eg some temp-file
> directory where the old files aren't cleaned away?). I can't imagine what
> else it could be doing in kernel space than simply some silly filesystem
> operation, but dang it all, Linux filesystems are usually very efficient
> indeed, unless we're talking huge directories (and if it's not the git
> object directory any more, it must be something else).

64 files in tmp.
But the SVN repository itself has 411,000 files in it. Split between
two directories.

Is there some pack equivalent for svn that I haven't found yet?

> At least with the cvs importer I have _some_ clue what it's doing, since I
> wrote an earlier version myself (very different, but at least I know what
> the operations are). SVN has always just confused me, and I have no idea
> what svnimport does, so I think I'll have to defer to somebody who
> actually knows the code.
>
> Smurf, have you looked at any larger repositories?
>
>                 Linus
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: svn to git, N-squared?
From: Randal L. Schwartz @ 2006-06-12 16:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jon Smirl, git
In-Reply-To: <Pine.LNX.4.64.0606112028010.5498@g5.osdl.org>

>>>>> "Linus" == Linus Torvalds <torvalds@osdl.org> writes:

Linus> This sounds like _exactly_ what happens if you don't repack
Linus> occasionally.  Expecially if you are using a filesystem without hashed
Linus> filename lookup, but it's true to some degree even with that - the
Linus> filesystem tends to end up spending tons of time in kernel space,
Linus> trying to find a place to put new objects.

I'm using git-svn to do a similar thing with a 11K-commit history.  It's now 4
days running, and yes, I'm repacking and deleting empty dirs every 200-300
commits, but I'm only up to commit 4000 or so.  At this rate, I *may* finish
by sometime next week. :(

However, I notice one thing that can't be good: .git/git-svn/revs has one file
per revision.  Yes, I'll end up with 11000 files in a single directory.  Ugh.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

^ permalink raw reply

* Re: svn to git, N-squared?
From: Jon Smirl @ 2006-06-12 16:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux@horizon.com, git
In-Reply-To: <Pine.LNX.4.64.0606120843340.5498@g5.osdl.org>

On 6/12/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Mon, 12 Jun 2006, Jon Smirl wrote:
> >
> > I've stablized like this. 1GB RAM with 2.8Ghz P4 hyperthread. Is there
> > anyway to tell what it is doing in the kernel for so much time?
>
> oprofile will tell you.

I don't have profiling turn on in the kernel. I've turned it on so
I'll pick it up next time I reboot.
I'll kill everything and restart when my new RAM arrives tomorrow.

Hopefully the SVN import will finish before then but it doesn't look likely.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: svn to git, N-squared?
From: Linus Torvalds @ 2006-06-12 16:12 UTC (permalink / raw)
  To: Jon Smirl; +Cc: linux@horizon.com, git
In-Reply-To: <9e4733910606120855p1cec9acfy62dadb89c11756b4@mail.gmail.com>



On Mon, 12 Jun 2006, Jon Smirl wrote:
> 
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 14525 jonsmirl  16   0  604m 391m 1904 S   24 38.7 916:53.39 git-svnimport
> 20947 jonsmirl  17   0     0    0    0 R    1  0.0   0:00.03 git-svnimport

Hard to tell, it's obviously got short-lived processes there too that it's 
not showing, but equally obviously that svnimport script itself is 
spending an alarming amount of CPU time. I don't think it should do that 
much processing, but since it's written in perl, I can't read it.

Are there any other directories that seem to be growing (eg some temp-file 
directory where the old files aren't cleaned away?). I can't imagine what 
else it could be doing in kernel space than simply some silly filesystem 
operation, but dang it all, Linux filesystems are usually very efficient 
indeed, unless we're talking huge directories (and if it's not the git 
object directory any more, it must be something else).

At least with the cvs importer I have _some_ clue what it's doing, since I 
wrote an earlier version myself (very different, but at least I know what 
the operations are). SVN has always just confused me, and I have no idea 
what svnimport does, so I think I'll have to defer to somebody who 
actually knows the code.

Smurf, have you looked at any larger repositories?

		Linus

^ permalink raw reply

* Re: svn to git, N-squared?
From: Jon Smirl @ 2006-06-12 15:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux@horizon.com, git
In-Reply-To: <Pine.LNX.4.64.0606120843340.5498@g5.osdl.org>

On 6/12/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Mon, 12 Jun 2006, Jon Smirl wrote:
> >
> > I've stablized like this. 1GB RAM with 2.8Ghz P4 hyperthread. Is there
> > anyway to tell what it is doing in the kernel for so much time?
>
> oprofile will tell you.
>
> I don't see why it would spend a lot of time in the kernel, unless it's
> the SVN part that does a ton of reads or something. git should have almost
> no kernel footprint apart from the individual objects creation/reading, so
> once it's repacked, I generally see very little system time.
>
> What does top say? (Ie can you see _which_ process spends time in the
> kernel?)

top - 11:54:32 up 4 days,  1:27,  5 users,  load average: 1.85, 1.74, 1.55
Tasks: 135 total,   2 running, 133 sleeping,   0 stopped,   0 zombie
Cpu(s): 14.7% us, 35.3% sy,  0.0% ni, 49.3% id,  0.0% wa,  0.2% hi,  0.5% si,  0
Mem:   1035740k total,  1020836k used,    14904k free,    18368k buffers
Swap: 118222276k total,   645124k used, 117577152k free,   183172k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
14525 jonsmirl  16   0  604m 391m 1904 S   24 38.7 916:53.39 git-svnimport
20947 jonsmirl  17   0     0    0    0 R    1  0.0   0:00.03 git-svnimport
20864 jonsmirl  16   0  2120 1024  788 R    1  0.1   0:00.08 top
 2436 root      15   0 71184  28m 6100 S    0  2.8 119:13.55 Xorg
    1 root      16   0  1992  340  312 S    0  0.0   0:00.79 init
    2 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/0
    3 root      34  19     0    0    0 S    0  0.0   0:01.42 ksoftirqd/0
    4 root      RT   0     0    0    0 S    0  0.0   0:00.00 watchdog/0


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: svn to git, N-squared?
From: Linus Torvalds @ 2006-06-12 15:45 UTC (permalink / raw)
  To: Jon Smirl; +Cc: linux@horizon.com, git
In-Reply-To: <9e4733910606120832xaf74e77pad7f70df864541fc@mail.gmail.com>



On Mon, 12 Jun 2006, Jon Smirl wrote:
> 
> I've stablized like this. 1GB RAM with 2.8Ghz P4 hyperthread. Is there
> anyway to tell what it is doing in the kernel for so much time?

oprofile will tell you.

I don't see why it would spend a lot of time in the kernel, unless it's 
the SVN part that does a ton of reads or something. git should have almost 
no kernel footprint apart from the individual objects creation/reading, so 
once it's repacked, I generally see very little system time.

What does top say? (Ie can you see _which_ process spends time in the 
kernel?)

		Linus

^ permalink raw reply

* Re: svn to git, N-squared?
From: Jon Smirl @ 2006-06-12 15:32 UTC (permalink / raw)
  To: linux@horizon.com; +Cc: git, torvalds
In-Reply-To: <20060612043949.20992.qmail@science.horizon.com>

On 12 Jun 2006 00:39:49 -0400, linux@horizon.com <linux@horizon.com> wrote:
> Insanity is copying the data rather than just the file name.  Git is
> good about not reading unnecessary files, and anything necessary should
> be cached, so on-disk fragmentation is not a concern.

I've run a pack and I moved the objects to new directories. Directory
is 746M with 64K files now.

I've stablized like this. 1GB RAM with 2.8Ghz P4 hyperthread. Is there
anyway to tell what it is doing in the kernel for so much time?

procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy id wa st
 1  0 599128  24712  38196 247008    0    0     0     0  451   382 12
39 48  0  0
 1  0 599128  24836  38196 246872    0    0     0     4  413   354 15
38 48  0  0
 1  0 599128  24960  38212 246856    0    0     0    64  453   390 15
37 48  0  0
 1  0 599128  24960  38212 246856    0    0     0     0  414   367 12
40 49  0  0
 1  0 599128  23504  38212 248216    0    0     0     0  448   365 13
39 48  0  0
 1  0 599128  24156  38212 247604    0    0     0     0  407   355 13
39 49  0  0
 1  0 599128  25240  38212 246652    0    0     0     0  446   390 13
39 48  0  0
 1  0 599128  25240  38224 246572    0    0     4    48  415   418 12
40 47  0  0
 1  0 599128  25116  38232 246496    0    0     0    12  452   432 12
40 48  0  0

Still doesn't seem to be making much forward progress.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: bisect and gitk happy together
From: Linus Torvalds @ 2006-06-12 15:10 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git
In-Reply-To: <46a038f90606120441p74dd4872y441fe04470f4acd5@mail.gmail.com>



On Mon, 12 Jun 2006, Martin Langhoff wrote:
> 
> - git-bisect visualise wasn't as useful as just a plain gitk. (This
> may be because I was working with ~60 commits in a medium-sized
> project).

Definitely. Try just firing up gitk when you're bisecting a kernel archive 
with thousands of commits, and complex history..

That's when "git bisect visualize" really helps: when git bisect has 
already narrowed down the list of commits from "5 years" to "1 week", but 
you still have maybe a hundred-odd commits to go.

I agree that just plain "gitk" is actually nicer if you want to see the 
whole context. It's just that often the context is pretty damn confusing ;)

> - gitk didn't show the bad commit tagged specially, even if
> git-bisect had just identified it. Of course I could find it, but I
> had all the other good/bad commits well labelled. And not the one I
> was looking for. Odd.

It should be the head of the "bisect" branch, and naturally tagged that 
way.

			Linus

^ permalink raw reply

* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Linus Torvalds @ 2006-06-12 14:59 UTC (permalink / raw)
  To: Florian Forster; +Cc: Martin Langhoff, git
In-Reply-To: <20060612082448.GA11857@verplant.org>



On Mon, 12 Jun 2006, Florian Forster wrote:
> 
> Would it help to cache `git-annotate's output, e.g. using one of the
> `Cache::Cache' modules? Or is browsing of blobs too sparse for this to
> result in a performance gain? I'm sure the modules could be integrated
> as a weak precondition.

The apache setup at least on kernel.org is already set up to do caching, 
as long as the generated headers for the page allow it in the first place.

So caching inside gitweb is generally pointless, at least when it's at the 
level of one result page. At a higher level, if the internal caching might 
improve performance of _other_ pages because it caches the result of some 
intermediate important thing, it might be a different issue.

		Linus

^ permalink raw reply

* [PATCH] cvsimport: keep one index per branch during import
From: Martin Langhoff @ 2006-06-12 11:50 UTC (permalink / raw)
  To: junkio, git; +Cc: Martin Langhoff

With this patch we have a speedup and much lower IO when
importing trees with many branches. Instead of forcing
index re-population for each branch switch, we keep
many index files around, one per branch.

Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>

---

This patch should get some review. It is trivial, but not fully tested.
I am testing it on the moz repo (which will take a while) to check that I get
the same result with and without it. 

Performance-wise, it seems to be doing ~15K commits per hour, with
the mozilla repo, up from ~6Kcph on the same hardware. Of course, 
this is only noticeable in projects with lots of concurrent branches.
Linear projects don't get much from this patch.

With this change, we are now truly waiting on cvs to hand over the
files pronto! Running locally, it is apparent that it isn't IO wait
but the latency of the chatty cvs protocol that is making this slow.

Probably forking 2 or 3 processes to prefetch filerevs from cvs
and put them in a queue directory for the main process to pick
up would work wonders. Actually, they could call git-hash-object
and just put some file metadata in the queue directory. 
---
 git-cvsimport.perl |   37 ++++++++++++++++++++++++++++++-------
 1 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/git-cvsimport.perl b/git-cvsimport.perl
old mode 100755
new mode 100644
index 76f6246..9c4588f
--- a/git-cvsimport.perl
+++ b/git-cvsimport.perl
@@ -465,10 +465,15 @@ my $git_dir = $ENV{"GIT_DIR"} || ".git";
 $ENV{"GIT_DIR"} = $git_dir;
 my $orig_git_index;
 $orig_git_index = $ENV{GIT_INDEX_FILE} if exists $ENV{GIT_INDEX_FILE};
-my ($git_ih, $git_index) = tempfile('gitXXXXXX', SUFFIX => '.idx',
-				    DIR => File::Spec->tmpdir());
-close ($git_ih);
-$ENV{GIT_INDEX_FILE} = $git_index;
+
+my %index; # holds filenames of one index per branch
+{   # init with an index for origin
+    my ($fh, $fn) = tempfile('gitXXXXXX', SUFFIX => '.idx',
+			     DIR => File::Spec->tmpdir());
+    close ($fh);
+    $index{$opt_o} = $fn;
+}
+$ENV{GIT_INDEX_FILE} = $index{$opt_o};
 unless(-d $git_dir) {
 	system("git-init-db");
 	die "Cannot init the GIT db at $git_tree: $?\n" if $?;
@@ -496,6 +501,13 @@ unless(-d $git_dir) {
 	$tip_at_start = `git-rev-parse --verify HEAD`;
 
 	# populate index
+	unless ($index{$last_branch}) {
+	    my ($fh, $fn) = tempfile('gitXXXXXX', SUFFIX => '.idx',
+				     DIR => File::Spec->tmpdir());
+	    close ($fh);
+	    $index{$last_branch} = $fn;
+	}
+	$ENV{GIT_INDEX_FILE} = $index{$last_branch};
 	system('git-read-tree', $last_branch);
 	die "read-tree failed: $?\n" if $?;
 
@@ -776,8 +788,17 @@ while(<CVS>) {
 		}
 		if(($ancestor || $branch) ne $last_branch) {
 			print "Switching from $last_branch to $branch\n" if $opt_v;
-			system("git-read-tree", $branch);
-			die "read-tree failed: $?\n" if $?;
+			unless ($index{$branch}) {
+			    my ($fh, $fn) = tempfile('gitXXXXXX', SUFFIX => '.idx',
+						     DIR => File::Spec->tmpdir());
+			    close ($fh);
+			    $index{$branch} = $fn;
+			    $ENV{GIT_INDEX_FILE} = $index{$branch};
+			    system("git-read-tree", $branch);
+			    die "read-tree failed: $?\n" if $?;
+			} else {
+			    $ENV{GIT_INDEX_FILE} = $index{$branch};
+		        }
 		}
 		$last_branch = $branch if $branch ne $last_branch;
 		$state = 9;
@@ -841,7 +862,9 @@ #	VERSION:1.96->1.96.2.1
 }
 commit() if $branch and $state != 11;
 
-unlink($git_index);
+foreach my $git_index (values %index) {
+    unlink($git_index);
+}
 
 if (defined $orig_git_index) {
 	$ENV{GIT_INDEX_FILE} = $orig_git_index;
-- 
1.4.0.g5fba

^ permalink raw reply related

* bisect and gitk happy together
From: Martin Langhoff @ 2006-06-12 11:41 UTC (permalink / raw)
  To: git

I was using git-bisect earlier today, and at the exact point where it
told be about the bad commit, I opened gitk, which was showing all the
bad and good commits. It is great!

Two "user" notes, however:

 - git-bisect visualise wasn't as useful as just a plain gitk. (This
may be because I was working with ~60 commits in a medium-sized
project).

 - gitk didn't show the bad commit tagged specially, even if
git-bisect had just identified it. Of course I could find it, but I
had all the other good/bad commits well labelled. And not the one I
was looking for. Odd.

In any case, the bisect + gitk combo saved the day. I'm too ashamed to
tell what the bug actually was, though ;-)


martin

^ permalink raw reply

* Re: Collecting cvsps patches
From: Anand Kumria @ 2006-06-12 11:27 UTC (permalink / raw)
  To: git
In-Reply-To: <20060611224205.GF1297@nowhere.earth>

On Mon, 12 Jun 2006 00:42:05 +0200, Yann Dirson wrote:

> http://ydirson.free.fr/soft/git/cvsps.git

I think you need to chmod +x hooks/post-update

and then run 'git-update-server-info'.

Cheers,
Anand

^ permalink raw reply

* Re[1]: hi from Galusya B.
From: Galusya B. @ 2006-06-12 10:06 UTC (permalink / raw)
  To: Otto

Hi, Otto

I'm a very young and energetic lady! I have very positive attitude to life and people. I do enjoy new experience life can offer me: to see new interesting places, to meet new people.
I do try to enjoy every moment of life and accept everything the way it comes without complaining.
Though my life seems to be quite enjoyable there's one important thing missing. It's LOVE!
Without my beloved one, my soul mate, my King my life is not completed.
I wish i coud find him very soon so that we could share together every momement of the life-time romance! 
What about you? Could you be my King? If answer is "yes" - you can find more about me 
http://Aqgvj.im-waiting-4you.net/

Yourth faithfully
Galusya B.

^ permalink raw reply

* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Shawn Pearce @ 2006-06-12  9:19 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Martin Langhoff, Florian Forster, git
In-Reply-To: <Pine.LNX.4.63.0606121107520.21813@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Hi,
> 
> On Mon, 12 Jun 2006, Shawn Pearce wrote:
> 
> >   [gitweb]
> >     description=<div class=\"description\">\n\
> > This is a chunk of text which describes this repository.  Some\n\
> > of this text might be rather long, and might need many lines to\n\
> > really be able to describe the repository in a nice editor such as\n\
> > vi running in an 80 character wide xterm.\n\
> > </div>
> 
> AFAIK the trailing "\" will not work.

Actually it does.  I figured out that it works (and why it works)
when I implemented the GIT repository parser in Java for my pure
Java version of GIT...

For example:

  [spearce@spearce-pb15 bob]$ cat .git/config 
  [core]
          repositoryformatversion = 0
          filemode = true
  [gitweb]
          description = This is a very\nlong line to put into GIT\n\
  repo config.\n\
  I hope it works.
          on = true
  [spearce@spearce-pb15 bob]$ git repo-config gitweb.description
  This is a very
  long line to put into GIT
  repo config.
  I hope it works.
  [spearce@spearce-pb15 bob]$ git repo-config gitweb.on
  true

The use of a trailing \ makes sense; the collapsing of multiple
spaces into one space unless quoted inside of "" doesn't.
But whatever...

-- 
Shawn.

^ permalink raw reply

* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Johannes Schindelin @ 2006-06-12  9:08 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Martin Langhoff, Florian Forster, git
In-Reply-To: <20060612084056.GA29220@spearce.org>

Hi,

On Mon, 12 Jun 2006, Shawn Pearce wrote:

>   [gitweb]
>     description=<div class=\"description\">\n\
> This is a chunk of text which describes this repository.  Some\n\
> of this text might be rather long, and might need many lines to\n\
> really be able to describe the repository in a nice editor such as\n\
> vi running in an 80 character wide xterm.\n\
> </div>

AFAIK the trailing "\" will not work.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Shawn Pearce @ 2006-06-12  8:40 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Florian Forster, git
In-Reply-To: <46a038f90606120134n21c269bbj3e8c7e31d4d93a23@mail.gmail.com>

Martin Langhoff <martin.langhoff@gmail.com> wrote:
> >- If `GIT_DIR/description' is only used by gitweb it may be more
> >  consistent to use the git-repo-config option `gitweb.description' in
> >  the future.
> 
> Not sure how git-repo configurations deal with long entries. Right now
> the description may contain html for instance.

It has to be escaped, which could be ugly with HTML.  For example:

  [gitweb]
    description=<div class=\"description\">\n\
This is a chunk of text which describes this repository.  Some\n\
of this text might be rather long, and might need many lines to\n\
really be able to describe the repository in a nice editor such as\n\
vi running in an 80 character wide xterm.\n\
</div>

Forget a \ in front of a double quote (") or an LF and the entry is
corrupt.  So as nice as it sounds it might not be the best way to
obtain a description for gitweb.

-- 
Shawn.

^ permalink raw reply

* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Martin Langhoff @ 2006-06-12  8:34 UTC (permalink / raw)
  To: Florian Forster; +Cc: git
In-Reply-To: <20060612082448.GA11857@verplant.org>

On 6/12/06, Florian Forster <octo@verplant.org> wrote:
> On Mon, Jun 12, 2006 at 10:02:05AM +1200, Martin Langhoff wrote:
> > good! git-blame/git-annotate are quite expensive to run. Do you think
> > it would make sense making it conditional on a git-repo-config option
> > (gitweb.blame=1)?
>
> sure, that it's a big change and if it helps the kernel.org folks ;)
> I'll follow-up with a patch for this in a second..

That'd be great. I am looking into integrating other feature patches
too (like tarball downloads) that are useful but costly, making them
conditional too...

> Would it help to cache `git-annotate's output, e.g. using one of the

I think we can rely on proxies doing good caching -- a busy host like
kernel.org will have big reverse proxies in front. A git-blame for a
given file+commitsha doesn't change, so we can give it a long cache
time, like... forever ;-)

> I have two more points regarding gitweb's configuration:
> - IMHO it would make sense to move the general gitweb-configuration
>   (where are the repositories, where are the binaries, etc) out of the
>   script.  As far as I know the Debian maintainer of the `gitweb'
>   package has asked for this before but was refused for some reason..

Sounds like a reasonable request. I would make it rely on env vars,
$ENV{GITWEB_CONFIG} can generally point to /etc/gitweb.conf, and that
would override the config values we have.

This is trivial, and it means we buy a lot of flexibility from
apache's httpd.conf being able to point to different config files
depending on arbitrarty conditions.

BTW, I haven't seen the debian maintainer's request, was that on the list?

> - If `GIT_DIR/description' is only used by gitweb it may be more
>   consistent to use the git-repo-config option `gitweb.description' in
>   the future.

Not sure how git-repo configurations deal with long entries. Right now
the description may contain html for instance.



martin

^ permalink raw reply

* [PATCH] gitweb: Make the availability of the `blame' interface in gitweb configurable.
From: Florian Forster @ 2006-06-12  8:31 UTC (permalink / raw)
  To: git; +Cc: Florian Forster
In-Reply-To: <20060612082448.GA11857@verplant.org>

Since `git-annotate' is an expensive operation to run it may be desirable to
deactivate this functionality. This patch introduces the `gitweb.blame' option
to git-repo-config and disables the blame support by default.

Signed-off-by: Florian Forster <octo@verplant.org>


---

 gitweb/gitweb.cgi |   27 +++++++++++++++++++++++++--
 1 files changed, 25 insertions(+), 2 deletions(-)

3eea23e8d8a13579455cdf8d5088794d33bdcba2
diff --git a/gitweb/gitweb.cgi b/gitweb/gitweb.cgi
index 91c075d..5eabe06 100755
--- a/gitweb/gitweb.cgi
+++ b/gitweb/gitweb.cgi
@@ -837,6 +837,25 @@ sub git_read_projects {
 	return @list;
 }
 
+sub git_get_project_config {
+	my $key = shift;
+
+	return unless ($key);
+	$key =~ s/^gitweb\.//;
+	return if ($key =~ m/\W/);
+
+	my $val = qx(git-repo-config --get gitweb.$key);
+	return ($val);
+}
+
+sub git_get_project_config_bool {
+	my $val = git_get_project_config (@_);
+	if ($val and $val =~ m/true|yes|on/) {
+		return (1);
+	}
+	return; # implicit false
+}
+
 sub git_project_list {
 	my @list = git_read_projects();
 	my @projects;
@@ -1233,6 +1252,7 @@ sub git_tag {
 
 sub git_blame {
 	my $fd;
+	die_error('403 Permission denied', "Permission denied.") if (!git_get_project_config_bool ('blame'));
 	die_error('404 Not Found', "What file will it be, master?") if (!$file_name);
 	$hash_base ||= git_read_head($project);
 	die_error(undef, "Reading commit failed.") unless ($hash_base);
@@ -1468,6 +1488,7 @@ sub git_blob {
 		my $base = $hash_base || git_read_head($project);
 		$hash = git_get_hash_by_path($base, $file_name, "blob") || die_error(undef, "Error lookup file.");
 	}
+	my $have_blame = git_get_project_config_bool ('blame');
 	open my $fd, "-|", "$gitbin/git-cat-file blob $hash" or die_error(undef, "Open failed.");
 	git_header_html();
 	if (defined $hash_base && (my %co = git_read_commit($hash_base))) {
@@ -1479,8 +1500,10 @@ sub git_blob {
 		      " | " . $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=commitdiff;h=$hash_base")}, "commitdiff") .
 		      " | " . $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=tree;h=$co{'tree'};hb=$hash_base")}, "tree") . "<br/>\n";
 		if (defined $file_name) {
-			print $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=blame;h=$hash;hb=$hash_base;f=$file_name")}, "blame") .
-			" | " . $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=blob_plain;h=$hash;f=$file_name")}, "plain") .
+			if ($have_blame) {
+				print $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=blame;h=$hash;hb=$hash_base;f=$file_name")}, "blame") .  " | ";
+			}
+			print $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=blob_plain;h=$hash;f=$file_name")}, "plain") .
 			" | " . $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=blob;hb=HEAD;f=$file_name")}, "head") . "<br/>\n";
 		} else {
 			print $cgi->a({-href => "$my_uri?" . esc_param("p=$project;a=blob_plain;h=$hash")}, "plain") . "<br/>\n";
-- 
1.3.3

^ permalink raw reply related

* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Florian Forster @ 2006-06-12  8:24 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git
In-Reply-To: <46a038f90606111502g607be3cfnf83ce81764a5f909@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1335 bytes --]

Hi Martin,

On Mon, Jun 12, 2006 at 10:02:05AM +1200, Martin Langhoff wrote:
> good! git-blame/git-annotate are quite expensive to run. Do you think
> it would make sense making it conditional on a git-repo-config option
> (gitweb.blame=1)?

sure, that it's a big change and if it helps the kernel.org folks ;)
I'll follow-up with a patch for this in a second..

Would it help to cache `git-annotate's output, e.g. using one of the
`Cache::Cache' modules? Or is browsing of blobs too sparse for this to
result in a performance gain? I'm sure the modules could be integrated
as a weak precondition.

I have two more points regarding gitweb's configuration:
- IMHO it would make sense to move the general gitweb-configuration
  (where are the repositories, where are the binaries, etc) out of the
  script.  As far as I know the Debian maintainer of the `gitweb'
  package has asked for this before but was refused for some reason..
  Possibly a file `gitweb.conf' in the same directory as the script
  could be read and overwrite the builtin defaults..?
- If `GIT_DIR/description' is only used by gitweb it may be more
  consistent to use the git-repo-config option `gitweb.description' in
  the future.

Regards,
-octo
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
http://verplant.org/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: git-applymbox broken?
From: Eric W. Biederman @ 2006-06-12  7:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606111735440.5498@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> On Sun, 11 Jun 2006, Eric W. Biederman wrote:
>> 
>> This doesn't look like the From: header was in the middle of the
>> message until it was imported into git so it is probably a small
>> logic error that is easily corrected.  But I need to see what
>> we are parsing so I can understand what is happening.
>
> No, it's at the top of the body, although there might have been an empty 
> line or two (ie whitespace only) before it.

Ok.  I'm not certain why we would not be ignoring blank lines that
we used to skip.  The untested patch below should ensure we always
skip those lines.


>> Even if the header lines are in the middle of the body?
>
> What do you mean by "middle"?
>
> No, it should only look at From: and Subject: lines if they are at the 
> very top, with no other non-whitespace lines above them. But when it looks 
> at them and uses the data from them, it should then remove them from the 
> body - they are "conceptually" just extended header lines that just 
> happened to technically (from an rfc822 standpoint) be in the body of the 
> email.

This is a separate conversation and once the problem of not ignoring leading
blank lines is fixed I will be happy to address it.

Eric

diff --git a/mailinfo.c b/mailinfo.c
index 5b6c215..72c5454 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -279,6 +279,14 @@ static void handle_inbody_header(int *se
                        return;
                }
        }
+       /* Ignore leading blank lines */
+       if (!(*seen & SEEN_PREFIX)) {
+               char *ch;
+               for (ch = line; isspace(*ch); ch++)
+                       ;
+               if (*ch == '\0')
+                       return;
+       }
        *seen |= SEEN_PREFIX;
 }

^ permalink raw reply related

* Fresh stuff Most quality products for anyone who wants to become a champion in bed
From: Frankie @ 2006-06-12  5:27 UTC (permalink / raw)
  To: git

You certainly know the way of becoming a real, powerful man Doctor approved and safe formula to boost your confidence to heavens Increased desire, increased size and increased energy do matter She will love it immediately after you pull down your pants Go here to get wonderful products which will increase your size http://sexygd.com 

The proof of the pudding is in the eating Dog buy rum, cow drink am, hog in sty get drunk. A barking dog never bites. One, one dutty build dam.

^ permalink raw reply

* Order status, moon-glittering
From: Nora Tovar @ 2006-06-12  5:16 UTC (permalink / raw)
  To: linux-newbie

Even if you have no erectin problems SOFT CIAzLIS 
would help you to make BETTER SE  X MORE OFTEN!
and to bring  unimagnable plesure to her.

Just disolve half a pil under your tongue 
and get ready for action in 15 minutes. 

The tests showed that the majority of men 
after taking this medic ation were able to have 
PERFECT ER ECTI ON during 36 hours!

VISIT US, AND GET OUR SPECIAL 70% DISC OUNT OFER!

http://qdvile.feastlegend.com/?18596350

=====
too..." But no matter how he said it, it sounded  like  pleasant  fiction,
But  they  insisted that it was a powerful thunderbolt that blinded them. By
But with a difference. Here were gulls who thought as he thought, For each
to gawk that you couldn't push your way through them. And it was so typical.
night and cloud and storm, for the sport of it, while  the  Flock  huddled
     "So," I said. "I'm  not offering  any to you, because this is the first

     "I... I enjoy speed," Jonathan said, taken aback but proud  that  the
     So  we got around the mosquito mange spot and got up on the hillock. It

^ permalink raw reply

* Re: svn to git, N-squared?
From: linux @ 2006-06-12  4:39 UTC (permalink / raw)
  To: git, jonsmirl, torvalds

>	for i in ??
>	do
>		cp -r $i $i.new
>		rm -rf $i
>		mv $i.new $i
>	done
>
> in your .git/objects/ directory (CAREFUL! Any script that does "rm -rf" 
> should be double- and triple-checked for sanity! ;)

Insanity is copying the data rather than just the file name.  Git is
good about not reading unnecessary files, and anything necessary should
be cached, so on-disk fragmentation is not a concern.

rmdir --ignore-fail-on-non-empty ??	# Probably unnecessary.
for i in ??
do
	mkdir $i.new
	mv $i/* $i.new
	rmdir $i
	mv $i.new $i
done

^ permalink raw reply

* git-diff --cc broken in 1.4.0?
From: Martin Langhoff @ 2006-06-12  4:32 UTC (permalink / raw)
  To: git

I was looking at some merges in gitk and lamenting the apparent loss
of the nice two-sided diff we get with -cc, and now duting a slightly
messy merge I did git-diff -cc only to get...

$ git-ls-files --unmerged
100644 f1d3843b2b2e42ba78adcf37da6440f0d321852e 1       local/version.php
100644 9352efa45cd25d9ad58df12b4ac241ac226a8ad4 2       local/version.php
100644 50da9b47903f6179f55a3f44290e7feaa08342f4 3       local/version.php

$ git-diff --cc
diff --cc local/version.php
index 9352efa,50da9b4..0000000
--- a/local/version.php
+++ b/local/version.php

cheers,


martin

^ permalink raw reply

* Re: svn to git, N-squared?
From: Eric Wong @ 2006-06-12  4:29 UTC (permalink / raw)
  To: Jon Smirl; +Cc: git
In-Reply-To: <9e4733910606111902l709c71ccyf45070d55112739e@mail.gmail.com>

Jon Smirl <jonsmirl@gmail.com> wrote:
> I have Mozilla CVS in a SVN repository. I've been using git-svnimport
> to import it. This time I am letting it run to completion; but the
> import has been running for four days now and it is only up to 2004.
> The import task is stable at 570MB and it is using about 50% of my
> CPU. It is constantly spawning off git write-tree, read-tree,
> hash-object, update-index. It is not doing excessive disk activity.

SVN itself seems to get much slower as you get towards newer revisions
in a repository (FSFS) with lots of history.  I've been experimenting a
bit with a local copy of the gcc repo from November and git-svn SUCKED
at importing it (it took over a week and I cancelled it out of
frustration).   I started repacking too, but, and it didn't help,  Much
of the performance defieciency was the svn sub process. being extremely
slow at updating.

I also tried git-svnimport, of course, but I only had 512M on that
machine and the machine became unusable due to heavy swapping.

> The import seems to be getting n-squared slower. It is still making
> forward progress but the progress seems to be getting slower and
> slower.
> 
> It looks like it is doing write-tree, read-tree, hash-object,
> update-index once or more per change set. If these commands are
> n-proportional and they are getting run n times, then this is a
> n-squared process. Projecting this out, the import may take 10 days or
> more to completely finish.

I'm working on some improvements to git-svn to make it a bit more
spiffy.

-- 
Eric Wong

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox