Git development

Git development
 help / color / mirror / Atom feed

* :), neon tube
From: Lara Obrien @ 2006-06-12 19:10 UTC (permalink / raw)
  To: linux-newbie

Even if you have no erectin problems SOFT CIAzLIS 
would help you to make BETTER SE  X MORE OFTEN!
and to bring  unimagnable plesure to her.

Just disolve half a pil under your tongue 
and get ready for action in 15 minutes. 

The tests showed that the majority of men 
after taking this medic ation were able to have 
PERFECT ER ECTI ON during 36 hours!

VISIT US, AND GET OUR SPECIAL 70% DISC OUNT OFER!

http://vvvflg.framefro.net/?83415033

==========
tired, or to sleep.
crossed over the hillock  and moved  on and on toward us, right next  to the
or fighting, or power in the Flock? A thousand lives, Jon,  ten  thousand!
After all, what  can those  toads do to me? He  really  didn't  have to  say
are special and gifted and divine, above other birds."
and asked gloomily:

Strugatsky fury--and it is fury:  disgust  with hypocrisy, with bureaucratic
You'll be saying good-bye yet!  We  were off. The institute was on our right

^ permalink raw reply

* Re: Thoughts on adding another hook to git
From: Yakov Lerner @ 2006-06-12 19:06 UTC (permalink / raw)
  To: David Kowis; +Cc: git
In-Reply-To: <448DB201.5090208@shlrm.org>

On 6/12/06, David Kowis <dkowis@shlrm.org> wrote:
> I'd like to be able to modify the commit message before it ends up in
> the $EDITOR.

Can't you define $EDITOR to point to some script
which modifies the file as you wish then calls the
real editor on it ?

Yakov

^ permalink raw reply

* Re: svn to git, N-squared?
From: Yakov Lerner @ 2006-06-12 19:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jon Smirl, git
In-Reply-To: <Pine.LNX.4.64.0606112056440.5498@g5.osdl.org>

On 6/12/06, Linus Torvalds <torvalds@osdl.org> wrote:
> On Sun, 11 Jun 2006, Jon Smirl wrote:
> > I have it stopped and I am running the repack.
> > There are 1.27M files in my .git directory
> Yeah, that would do it. That's ~5000 files per object directory, so I
> assume that your directories are 200+kB in size, and for every new object
> added, you'll basically have to traverse the old directory fully in order
> to find an empty place for it

Is this related to 1-level dir tree for objects (12/object)
vs 2-level dir tree (12/34/object) ? Does git employ more levels
for object tree for large projects ?

Yakov

^ permalink raw reply

* Re: svn to git, N-squared?
From: Jon Smirl @ 2006-06-12 19:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux@horizon.com, git
In-Reply-To: <9e4733910606121106ta925b6er49fe68bf3c1031f5@mail.gmail.com>

On 6/12/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 6/12/06, Linus Torvalds <torvalds@osdl.org> wrote:
> > Having that many files in a single directory (or two) is a total disaster.
> > That said, it works well enough if you don't create new files very often
> > (and _preferably_ don't look them up either, although that is effectively
> > helped by indexing). I _suspect_ that
>
> Posted to the svn list, they said that 220K files is normal. They told
> me to turn on the ext2 dir_index option. Cheking my system I see that
> none of partitions have it turned on so it must not be the default for
> FC5.
>
> I have to unmount the drive to convert existing directories. I can
> trying doing the file move trick while the process is running since
> new directories will use it.

I converted the ext3 directories to dir_index on the fly using the
move trick. Switching the directory index makes it look like it is
spending even more time in the kernel.

procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy id wa st
 1  0 636188  22380  19176 157200    0    0     0    52  436   415 13
40 48  0  0
 1  0 636188  22504  19176 157200    0    0     0     0  430   373 13
38 49  0  0
 1  0 636188  22628  19176 157064    0    0     0     0  433   380 12
39 49  0  0
 1  0 636188  22628  19184 157056    0    0     0    20  434   390 12
38 49  0  0
 1  0 636188  22628  19184 156920    0    0     0     0  431   376 11
40 49  0  0
 1  0 636188  22752  19192 156912    0    0     0    48  437   376 12
40 49  0  0
 1  0 636188  22876  19192 156912    0    0     0     0  430   386 11
40 49  0  0
 1  0 636188  22752  19192 156776    0    0     0     0  431   370 10
41 49  0  0
 1  0 636188  23016  19192 156776    0    0     8     0  422   500 22
40 37  2  0

The size of the svn directories went from 3.2MB to 4.4MB after they
were converted to ext3 indexed mode.

I'll get oprofile running when I do a reboot.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: git-applymbox broken?
From: Eric W. Biederman @ 2006-06-12 18:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606111735440.5498@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> What do you mean by "middle"?
>
> No, it should only look at From: and Subject: lines if they are at the 
> very top, with no other non-whitespace lines above them. But when it looks 
> at them and uses the data from them, it should then remove them from the 
> body - they are "conceptually" just extended header lines that just 
> happened to technically (from an rfc822 standpoint) be in the body of the 
> email.

Below is an example of the kind of patch that inspired me to relax the
rules on parsing in body headers (this comes from Andi Kleen quilt tree).

The first line in this instance is obviously a subject line but there
is not really good way to detect that.  Then we get a From: line.

Now I doubt any patches ever hit the mail in this format and it probably
isn't worth it to track down every variation of patch headers in existence.
But if we don't find a From: header in the body prefix it seems to make
sense to keep looking for headers in the body, and to use the information
if we find it.

---
Kdump i386 nmi event notification fix

From: Vivek Goyal <vgoyal@in.ibm.com>

After a crash we should wait for NMI IPI event and not for external NMI or
NMI watchdog tick.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 arch/i386/kernel/crash.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

Index: linux/arch/i386/kernel/crash.c
===================================================================
--- linux.orig/arch/i386/kernel/crash.c
+++ linux/arch/i386/kernel/crash.c
@@ -102,7 +102,7 @@ static int crash_nmi_callback(struct not
 	struct pt_regs fixed_regs;
 	int cpu;

-	if (val != DIE_NMI)
+	if (val != DIE_NMI_IPI)
 		return NOTIFY_OK;

 	regs = ((struct die_args *)data)->regs;
@@ -113,7 +113,7 @@ static int crash_nmi_callback(struct not
 	 * an NMI if system was initially booted with nmi_watchdog parameter.
 	 */
 	if (cpu == crashing_cpu)
-		return 1;
+		return NOTIFY_STOP;
 	local_irq_disable();

 	if (!user_mode_vm(regs)) {

^ permalink raw reply

* [PATCH] Ignore blank lines among this inbody headers.
From: Eric W. Biederman @ 2006-06-12 18:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606111735440.5498@g5.osdl.org>


This is a fix for a regression introduced in:
8b4525fb3c6d79bd3a64b8f441237a4095db4e22.

When I refactored the inbody header parsing into a state machine I failed
to see the logic that skipped multiple leading spaces if they are present.
I think I assumed that logic was just there to skip the initial blank
line between the mail headers and the body.

This restores that behaviour and since we ignore all leading blank lines
in commit messages now this code removes the special case for the blank
line between the mail headers and the body.
---
 mailinfo.c |   24 ++++++++++++++++--------
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/mailinfo.c b/mailinfo.c
index 5b6c215..3696d61 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -229,6 +229,14 @@ static int is_multipart_boundary(const c
 	return (!memcmp(line, multipart_boundary, multipart_boundary_len));
 }
 
+static int is_blank(char *line)
+{
+	char *ch;
+	for (ch = line; isspace(*ch); ch++)
+		;
+	return *ch == '\0';
+}
+
 static int eatspace(char *line)
 {
 	int len = strlen(line);
@@ -243,7 +251,7 @@ #define SEEN_SUBJECT 04
 #define SEEN_BOGUS_UNIX_FROM 010
 #define SEEN_PREFIX  020
 
-/* First lines of body can have From:, Date:, and Subject: */
+/* First lines of body can have From:, Date:, and Subject: or be blank */
 static void handle_inbody_header(int *seen, char *line)
 {
 	if (!memcmp(">From", line, 5) && isspace(line[5])) {
@@ -279,6 +287,10 @@ static void handle_inbody_header(int *se
 			return;
 		}
 	}
+	if (isspace(line[0])) {
+		if (!(*seen & SEEN_PREFIX) && is_blank(line))
+			return;
+	}
 	*seen |= SEEN_PREFIX;
 }
 
@@ -420,9 +432,7 @@ static int read_one_header_line(char *li
 		if (fgets(line + ofs, sz - ofs, in) == NULL)
 			break;
 		len = eatspace(line + ofs);
-		if (len == 0)
-			break;
-		if (!is_rfc2822_header(line)) {
+		if ((len == 0) || !is_rfc2822_header(line)) {
 			/* Re-add the newline */
 			line[ofs + len] = '\n';
 			line[ofs + len + 1] = '\0';
@@ -762,10 +772,8 @@ static void handle_body(void)
 {
 	int seen = 0;
 
-	if (line[0] || fgets(line, sizeof(line), stdin) != NULL) {
-		handle_commit_msg(&seen);
-		handle_patch();
-	}
+	handle_commit_msg(&seen);
+	handle_patch();
 	fclose(patchfile);
 	if (!patch_lines) {
 		fprintf(stderr, "No patch found\n");
-- 
1.4.0.rc2.g5e3a6

^ permalink raw reply related

* Thoughts on adding another hook to git
From: David Kowis @ 2006-06-12 18:27 UTC (permalink / raw)
  To: git

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

I'd like to be able to modify the commit message before it ends up in
the $EDITOR. This is a fairly trivial thing to implement:
Call ${GIT_DIR}/hooks/pre-editor on COMMIT_MESSAGE before opening it in
$EDITOR.

My question to you all is should I set it up so that the hook only opens
when the $EDITOR is actually being called? (really easy)
Or, do I set it up so that the hook always happens. In which case it's
similar to the commit-msg hook already, just happens before the message
instead of after.

Thanks,
- --
David Kowis

ISO Team Lead - www.sourcemage.org
Source Mage GNU/Linux

Progress isn't made by early risers. It's made by lazy men trying to
find easier ways to do something.
  - Robert Heinlein
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32)

iQGVAwUBRI2yAMnf+vRw63ObAQpNSgv+OCXYSDlW96K9M5QZvSaEbdZOGorYZg5Y
RSh9WUXS2ribYRr1TbplD0Fp4vGnG8CB6qvr2QF8vP3tbEMjnwk4LobeWaUtK2Kn
Hja3TgIUPWkzHMLleToe5o99r8v/6LFf9rkBxvFw3TMkuxsFS/lFlxy1eRa43rvd
Skod2cA7RWus1IFJcbDKNonjhJkVkHylSMjT8iVQDbgY0hg7PEy2ZW3XB0MJJRZC
lLsDDIJ4msPCXSx/lDRGaJj+m7IrvUgnEDzkX0jTT8DeZqnlC8nRM/2dOS72b/5w
gIBYu49DvTL8ynod2mmYTyBynfRpVxPjxnXbubn/M+N+0WCTXIUTPCbyW2MOscjA
pFe6/S1qKaTqc06VBDabYxdvGrHG6v+KkaJhu2XoLOHWVoBblobBBNrpIkA6GNqz
H7JHNJDF+JbshlW2aU2HazDINRfD/AfrJmDx4Xn91qAKiegyO3wRA1rM6a0LEpun
zg3haF3l0rfBEdFpz21gNQbYxNHaRkwg
=Rxm/
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: gitweb: Config file support (was: Adding a `blame' interface.)
From: Florian Forster @ 2006-06-12 18:11 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git
In-Reply-To: <46a038f90606120134n21c269bbj3e8c7e31d4d93a23@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 714 bytes --]

Hi Martin,

On Mon, Jun 12, 2006 at 08:34:43PM +1200, Martin Langhoff wrote:
> > As far as I know the Debian maintainer of the `gitweb' package has
> > asked for this before but was refused for some reason..
> BTW, I haven't seen the debian maintainer's request, was that on the list?

Yes, it was a mail by Andres Salomon on May 20th, 2005 with the subject
`add conf file support to gitweb'. A friend of mine asked him if he had
sent the patch upstream and he pointed to this message and explained he
had gotten a private reply saying that gitweb `only covers the special
needs on kernel.org'.

Regards,
-octo
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
http://verplant.org/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: svn to git, N-squared?
From: Jon Smirl @ 2006-06-12 18:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux@horizon.com, git
In-Reply-To: <Pine.LNX.4.64.0606120958230.5498@g5.osdl.org>

On 6/12/06, Linus Torvalds <torvalds@osdl.org> wrote:
> Having that many files in a single directory (or two) is a total disaster.
> That said, it works well enough if you don't create new files very often
> (and _preferably_ don't look them up either, although that is effectively
> helped by indexing). I _suspect_ that

Posted to the svn list, they said that 220K files is normal. They told
me to turn on the ext2 dir_index option. Cheking my system I see that
none of partitions have it turned on so it must not be the default for
FC5.

I have to unmount the drive to convert existing directories. I can
trying doing the file move trick while the process is running since
new directories will use it.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: [PATCH] gitweb: Supporting caches (was: Adding a `blame' interface.)
From: Florian Forster @ 2006-06-12 17:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin Langhoff, git
In-Reply-To: <Pine.LNX.4.64.0606120754460.5498@g5.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 1286 bytes --]

On Mon, Jun 12, 2006 at 07:59:39AM -0700, Linus Torvalds wrote:
> The apache setup at least on kernel.org is already set up to do
> caching, as long as the generated headers for the page allow it in the
> first place.

I've actually looked into improving native HTTP caching (mostly for
small site without revers proxying) by providing a `Last-Modified'
header where possible and sending a `304 Not Modified' whenever
appropriate.

While it doesn't sound hard it's next to impossible: A commit's
timestamp doesn't change when head a points to it (or does not longer
point to it). Also displaying the timestamps as `Modified xy
{seconds,minutes, hours,...} ago' possess a big problem.

(I guess the webserver could use the `If-Modified-Since' header to check
if the displayed time needs to be updated, but if you ask me it's not
worth the effort.)

In short, the `blob', `blob_plain', and `blobdiff' pages could profit
from that because they don't display the head(s) pointing to the current
commit. On the other hand, this is a little inconsistent and could be
considered a bug. So I'll give up on that unless someone has a great
idea how to handle this.

Regards,
-octo
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
http://verplant.org/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: svn to git, N-squared?
From: Linus Torvalds @ 2006-06-12 17:08 UTC (permalink / raw)
  To: Jon Smirl; +Cc: linux@horizon.com, git
In-Reply-To: <9e4733910606120944p4deb170ejc2863846685917f6@mail.gmail.com>

On Mon, 12 Jun 2006, Jon Smirl wrote:
> 
> The svn repository was built by cvs2svn, none of the git tools were involved.

Ok, so that part is purely a SVN issue.

Having that many files in a single directory (or two) is a total disaster. 
That said, it works well enough if you don't create new files very often 
(and _preferably_ don't look them up either, although that is effectively 
helped by indexing). I _suspect_ that 

 - the "cvs->svn" import process was probably optimized so that it did one 
   file at a time (your "eight stages" description certainly sounds as if 
   it could do it), and in that case it's entirely possible that that can 
   be done efficiently (ie you still do file creates and lookups in an 
   increasingly big directory, but you do it only _once_ per file, rather 
   than look up old files all the time). So your lookup ratio would be 1:1 
   with the files.

   Doing a git-cvsimport would then do basically random lookups in that 
   _huge_ directory, and instead of reading the files one at a time (and 
   fully) and never again, I assume it opens them, reads one revision, 
   closes it, and then goes on to the next revision, so it will have a 
   much higher lookup ratio (you'd look up every file several times).

 - I suspect the SVN people must be hurting for performance themselves. I 
   guess they don't expect to be able to do 5-10 commits per second, the 
   way git was designed to do. So they optimized the cvs import part, but 
   their actual regular live usage is probably hitting this same directory 
   inefficiency.

Of course, the old SVN Berkeley DB usage was probably even worse (not in 
system time, but I'd expect the access patterns within the BDB file to be 
pretty nasty, and probably a lot of user time spent seeking around it). 
But in this particular case, it might even have been better.

Maybe we could teach the SVN people about pack-files? ;)

			Linus

^ permalink raw reply

* Re: svn to git, N-squared?
From: Linus Torvalds @ 2006-06-12 16:57 UTC (permalink / raw)
  To: Jon Smirl; +Cc: linux@horizon.com, git
In-Reply-To: <9e4733910606120932k5b6f7acfra3f3a26168454f47@mail.gmail.com>

On Mon, 12 Jun 2006, Jon Smirl wrote:
> > 
> > 64 files in tmp.
> > But the SVN repository itself has 411,000 files in it. Split between
> > two directories.
> 
> I'm doing all of this on ext3. I have plenty of free disk space so I
> can make another partition and switch to a new file system after I
> install the new RAM. What would be the best one to try? Doing that
> would provide a data point to determine if this is a problem with file
> system performance or the misuse of file systems.

I'm sure there are better filesystems to try for this kind of insane 
schenario, but at the same time, I really cannot imaging that the 411,000 
files is a "normal" thing. There _must_ be some way to have SVN not do 
that in the first place (or git-svnimport).

Is this what happened when the SVN people started using fsfs? 

			Linus

^ permalink raw reply

* Re: svn to git, N-squared?
From: Jon Smirl @ 2006-06-12 16:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux@horizon.com, git
In-Reply-To: <Pine.LNX.4.64.0606120938490.5498@g5.osdl.org>

On 6/12/06, Linus Torvalds <torvalds@osdl.org> wrote:
> > Is there some pack equivalent for svn that I haven't found yet?
>
> Is this literally what SVN does normally? That's just insane. I mean, even
> git tried to at least hash out the files (and yeah, admittedly even that
> worked less well than I was hoping for, but I at least fixed it within
> just a few weeks through the pack mechanism).
>
> Or is that 411,000 files a result of how git-svnimport does things, rather
> than some basic SVN approach to live: does it perhaps end up checking out
> each file under an individual temporary name?

The svn repository was built by cvs2svn, none of the git tools were involved.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: svn to git, N-squared?
From: Linus Torvalds @ 2006-06-12 16:41 UTC (permalink / raw)
  To: Jon Smirl; +Cc: linux@horizon.com, git
In-Reply-To: <9e4733910606120922g181a5aaal623fd3f29b839f4c@mail.gmail.com>

On Mon, 12 Jun 2006, Jon Smirl wrote:
>
> 64 files in tmp.
> But the SVN repository itself has 411,000 files in it. Split between
> two directories.

Ouch. That sounds like it. 

> Is there some pack equivalent for svn that I haven't found yet?

Is this literally what SVN does normally? That's just insane. I mean, even 
git tried to at least hash out the files (and yeah, admittedly even that 
worked less well than I was hoping for, but I at least fixed it within 
just a few weeks through the pack mechanism).

Or is that 411,000 files a result of how git-svnimport does things, rather 
than some basic SVN approach to live: does it perhaps end up checking out 
each file under an individual temporary name?

			Linus

^ permalink raw reply

* Re: svn to git, N-squared?
From: Jon Smirl @ 2006-06-12 16:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux@horizon.com, git
In-Reply-To: <9e4733910606120922g181a5aaal623fd3f29b839f4c@mail.gmail.com>

On 6/12/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 6/12/06, Linus Torvalds <torvalds@osdl.org> wrote:
> >
> >
> > On Mon, 12 Jun 2006, Jon Smirl wrote:
> > >
> > >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > > 14525 jonsmirl  16   0  604m 391m 1904 S   24 38.7 916:53.39 git-svnimport
> > > 20947 jonsmirl  17   0     0    0    0 R    1  0.0   0:00.03 git-svnimport
> >
> > Hard to tell, it's obviously got short-lived processes there too that it's
> > not showing, but equally obviously that svnimport script itself is
> > spending an alarming amount of CPU time. I don't think it should do that
> > much processing, but since it's written in perl, I can't read it.
> >
> > Are there any other directories that seem to be growing (eg some temp-file
> > directory where the old files aren't cleaned away?). I can't imagine what
> > else it could be doing in kernel space than simply some silly filesystem
> > operation, but dang it all, Linux filesystems are usually very efficient
> > indeed, unless we're talking huge directories (and if it's not the git
> > object directory any more, it must be something else).
>
> 64 files in tmp.
> But the SVN repository itself has 411,000 files in it. Split between
> two directories.

I'm doing all of this on ext3. I have plenty of free disk space so I
can make another partition and switch to a new file system after I
install the new RAM. What would be the best one to try? Doing that
would provide a data point to determine if this is a problem with file
system performance or the misuse of file systems.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: svn to git, N-squared?
From: Randal L. Schwartz @ 2006-06-12 16:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jon Smirl, git
In-Reply-To: <86irn6wdob.fsf@blue.stonehenge.com>

>>>>> "Randal" == Randal L Schwartz <merlyn@stonehenge.com> writes:

>>>>> "Linus" == Linus Torvalds <torvalds@osdl.org> writes:
Linus> This sounds like _exactly_ what happens if you don't repack
Linus> occasionally.  Expecially if you are using a filesystem without hashed
Linus> filename lookup, but it's true to some degree even with that - the
Linus> filesystem tends to end up spending tons of time in kernel space,
Linus> trying to find a place to put new objects.

Randal> I'm using git-svn to do a similar thing with a 11K-commit history.  It's now 4
Randal> days running, and yes, I'm repacking and deleting empty dirs every 200-300
Randal> commits, but I'm only up to commit 4000 or so.  At this rate, I *may* finish
Randal> by sometime next week. :(

Randal> However, I notice one thing that can't be good: .git/git-svn/revs has one file
Randal> per revision.  Yes, I'll end up with 11000 files in a single directory.  Ugh.

Another contributing factor is that there's 2500 files in the repo (at
revision 3931).  I was recording 20 commits a minute in the early part of the
cycle, and now I'm down to 1 commit every two minutes.  Doing a bit of
back-of-the-scribbled-on-envelope calcs, I won't be finished for
another two weeks or so. :(

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

^ permalink raw reply

* Re: svn to git, N-squared?
From: Jon Smirl @ 2006-06-12 16:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux@horizon.com, git
In-Reply-To: <Pine.LNX.4.64.0606120906210.5498@g5.osdl.org>

On 6/12/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Mon, 12 Jun 2006, Jon Smirl wrote:
> >
> >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > 14525 jonsmirl  16   0  604m 391m 1904 S   24 38.7 916:53.39 git-svnimport
> > 20947 jonsmirl  17   0     0    0    0 R    1  0.0   0:00.03 git-svnimport
>
> Hard to tell, it's obviously got short-lived processes there too that it's
> not showing, but equally obviously that svnimport script itself is
> spending an alarming amount of CPU time. I don't think it should do that
> much processing, but since it's written in perl, I can't read it.
>
> Are there any other directories that seem to be growing (eg some temp-file
> directory where the old files aren't cleaned away?). I can't imagine what
> else it could be doing in kernel space than simply some silly filesystem
> operation, but dang it all, Linux filesystems are usually very efficient
> indeed, unless we're talking huge directories (and if it's not the git
> object directory any more, it must be something else).

64 files in tmp.
But the SVN repository itself has 411,000 files in it. Split between
two directories.

Is there some pack equivalent for svn that I haven't found yet?

> At least with the cvs importer I have _some_ clue what it's doing, since I
> wrote an earlier version myself (very different, but at least I know what
> the operations are). SVN has always just confused me, and I have no idea
> what svnimport does, so I think I'll have to defer to somebody who
> actually knows the code.
>
> Smurf, have you looked at any larger repositories?
>
>                 Linus
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: svn to git, N-squared?
From: Randal L. Schwartz @ 2006-06-12 16:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jon Smirl, git
In-Reply-To: <Pine.LNX.4.64.0606112028010.5498@g5.osdl.org>

>>>>> "Linus" == Linus Torvalds <torvalds@osdl.org> writes:

Linus> This sounds like _exactly_ what happens if you don't repack
Linus> occasionally.  Expecially if you are using a filesystem without hashed
Linus> filename lookup, but it's true to some degree even with that - the
Linus> filesystem tends to end up spending tons of time in kernel space,
Linus> trying to find a place to put new objects.

I'm using git-svn to do a similar thing with a 11K-commit history.  It's now 4
days running, and yes, I'm repacking and deleting empty dirs every 200-300
commits, but I'm only up to commit 4000 or so.  At this rate, I *may* finish
by sometime next week. :(

However, I notice one thing that can't be good: .git/git-svn/revs has one file
per revision.  Yes, I'll end up with 11000 files in a single directory.  Ugh.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

^ permalink raw reply

* Re: svn to git, N-squared?
From: Jon Smirl @ 2006-06-12 16:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux@horizon.com, git
In-Reply-To: <Pine.LNX.4.64.0606120843340.5498@g5.osdl.org>

On 6/12/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Mon, 12 Jun 2006, Jon Smirl wrote:
> >
> > I've stablized like this. 1GB RAM with 2.8Ghz P4 hyperthread. Is there
> > anyway to tell what it is doing in the kernel for so much time?
>
> oprofile will tell you.

I don't have profiling turn on in the kernel. I've turned it on so
I'll pick it up next time I reboot.
I'll kill everything and restart when my new RAM arrives tomorrow.

Hopefully the SVN import will finish before then but it doesn't look likely.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: svn to git, N-squared?
From: Linus Torvalds @ 2006-06-12 16:12 UTC (permalink / raw)
  To: Jon Smirl; +Cc: linux@horizon.com, git
In-Reply-To: <9e4733910606120855p1cec9acfy62dadb89c11756b4@mail.gmail.com>

On Mon, 12 Jun 2006, Jon Smirl wrote:
> 
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 14525 jonsmirl  16   0  604m 391m 1904 S   24 38.7 916:53.39 git-svnimport
> 20947 jonsmirl  17   0     0    0    0 R    1  0.0   0:00.03 git-svnimport

Hard to tell, it's obviously got short-lived processes there too that it's 
not showing, but equally obviously that svnimport script itself is 
spending an alarming amount of CPU time. I don't think it should do that 
much processing, but since it's written in perl, I can't read it.

Are there any other directories that seem to be growing (eg some temp-file 
directory where the old files aren't cleaned away?). I can't imagine what 
else it could be doing in kernel space than simply some silly filesystem 
operation, but dang it all, Linux filesystems are usually very efficient 
indeed, unless we're talking huge directories (and if it's not the git 
object directory any more, it must be something else).

At least with the cvs importer I have _some_ clue what it's doing, since I 
wrote an earlier version myself (very different, but at least I know what 
the operations are). SVN has always just confused me, and I have no idea 
what svnimport does, so I think I'll have to defer to somebody who 
actually knows the code.

Smurf, have you looked at any larger repositories?

		Linus

^ permalink raw reply

* Re: svn to git, N-squared?
From: Jon Smirl @ 2006-06-12 15:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux@horizon.com, git
In-Reply-To: <Pine.LNX.4.64.0606120843340.5498@g5.osdl.org>

On 6/12/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Mon, 12 Jun 2006, Jon Smirl wrote:
> >
> > I've stablized like this. 1GB RAM with 2.8Ghz P4 hyperthread. Is there
> > anyway to tell what it is doing in the kernel for so much time?
>
> oprofile will tell you.
>
> I don't see why it would spend a lot of time in the kernel, unless it's
> the SVN part that does a ton of reads or something. git should have almost
> no kernel footprint apart from the individual objects creation/reading, so
> once it's repacked, I generally see very little system time.
>
> What does top say? (Ie can you see _which_ process spends time in the
> kernel?)

top - 11:54:32 up 4 days,  1:27,  5 users,  load average: 1.85, 1.74, 1.55
Tasks: 135 total,   2 running, 133 sleeping,   0 stopped,   0 zombie
Cpu(s): 14.7% us, 35.3% sy,  0.0% ni, 49.3% id,  0.0% wa,  0.2% hi,  0.5% si,  0
Mem:   1035740k total,  1020836k used,    14904k free,    18368k buffers
Swap: 118222276k total,   645124k used, 117577152k free,   183172k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
14525 jonsmirl  16   0  604m 391m 1904 S   24 38.7 916:53.39 git-svnimport
20947 jonsmirl  17   0     0    0    0 R    1  0.0   0:00.03 git-svnimport
20864 jonsmirl  16   0  2120 1024  788 R    1  0.1   0:00.08 top
 2436 root      15   0 71184  28m 6100 S    0  2.8 119:13.55 Xorg
    1 root      16   0  1992  340  312 S    0  0.0   0:00.79 init
    2 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/0
    3 root      34  19     0    0    0 S    0  0.0   0:01.42 ksoftirqd/0
    4 root      RT   0     0    0    0 S    0  0.0   0:00.00 watchdog/0


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: svn to git, N-squared?
From: Linus Torvalds @ 2006-06-12 15:45 UTC (permalink / raw)
  To: Jon Smirl; +Cc: linux@horizon.com, git
In-Reply-To: <9e4733910606120832xaf74e77pad7f70df864541fc@mail.gmail.com>

On Mon, 12 Jun 2006, Jon Smirl wrote:
> 
> I've stablized like this. 1GB RAM with 2.8Ghz P4 hyperthread. Is there
> anyway to tell what it is doing in the kernel for so much time?

oprofile will tell you.

I don't see why it would spend a lot of time in the kernel, unless it's 
the SVN part that does a ton of reads or something. git should have almost 
no kernel footprint apart from the individual objects creation/reading, so 
once it's repacked, I generally see very little system time.

What does top say? (Ie can you see _which_ process spends time in the 
kernel?)

		Linus

^ permalink raw reply

* Re: svn to git, N-squared?
From: Jon Smirl @ 2006-06-12 15:32 UTC (permalink / raw)
  To: linux@horizon.com; +Cc: git, torvalds
In-Reply-To: <20060612043949.20992.qmail@science.horizon.com>

On 12 Jun 2006 00:39:49 -0400, linux@horizon.com <linux@horizon.com> wrote:
> Insanity is copying the data rather than just the file name.  Git is
> good about not reading unnecessary files, and anything necessary should
> be cached, so on-disk fragmentation is not a concern.

I've run a pack and I moved the objects to new directories. Directory
is 746M with 64K files now.

I've stablized like this. 1GB RAM with 2.8Ghz P4 hyperthread. Is there
anyway to tell what it is doing in the kernel for so much time?

procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy id wa st
 1  0 599128  24712  38196 247008    0    0     0     0  451   382 12
39 48  0  0
 1  0 599128  24836  38196 246872    0    0     0     4  413   354 15
38 48  0  0
 1  0 599128  24960  38212 246856    0    0     0    64  453   390 15
37 48  0  0
 1  0 599128  24960  38212 246856    0    0     0     0  414   367 12
40 49  0  0
 1  0 599128  23504  38212 248216    0    0     0     0  448   365 13
39 48  0  0
 1  0 599128  24156  38212 247604    0    0     0     0  407   355 13
39 49  0  0
 1  0 599128  25240  38212 246652    0    0     0     0  446   390 13
39 48  0  0
 1  0 599128  25240  38224 246572    0    0     4    48  415   418 12
40 47  0  0
 1  0 599128  25116  38232 246496    0    0     0    12  452   432 12
40 48  0  0

Still doesn't seem to be making much forward progress.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: bisect and gitk happy together
From: Linus Torvalds @ 2006-06-12 15:10 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git
In-Reply-To: <46a038f90606120441p74dd4872y441fe04470f4acd5@mail.gmail.com>

On Mon, 12 Jun 2006, Martin Langhoff wrote:
> 
> - git-bisect visualise wasn't as useful as just a plain gitk. (This
> may be because I was working with ~60 commits in a medium-sized
> project).

Definitely. Try just firing up gitk when you're bisecting a kernel archive 
with thousands of commits, and complex history..

That's when "git bisect visualize" really helps: when git bisect has 
already narrowed down the list of commits from "5 years" to "1 week", but 
you still have maybe a hundred-odd commits to go.

I agree that just plain "gitk" is actually nicer if you want to see the 
whole context. It's just that often the context is pretty damn confusing ;)

> - gitk didn't show the bad commit tagged specially, even if
> git-bisect had just identified it. Of course I could find it, but I
> had all the other good/bad commits well labelled. And not the one I
> was looking for. Odd.

It should be the head of the "bisect" branch, and naturally tagged that 
way.

			Linus

^ permalink raw reply

* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Linus Torvalds @ 2006-06-12 14:59 UTC (permalink / raw)
  To: Florian Forster; +Cc: Martin Langhoff, git
In-Reply-To: <20060612082448.GA11857@verplant.org>

On Mon, 12 Jun 2006, Florian Forster wrote:
> 
> Would it help to cache `git-annotate's output, e.g. using one of the
> `Cache::Cache' modules? Or is browsing of blobs too sparse for this to
> result in a performance gain? I'm sure the modules could be integrated
> as a weak precondition.

The apache setup at least on kernel.org is already set up to do caching, 
as long as the generated headers for the page allow it in the first place.

So caching inside gitweb is generally pointless, at least when it's at the 
level of one result page. At a higher level, if the internal caching might 
improve performance of _other_ pages because it caches the result of some 
intermediate important thing, it might be a different issue.

		Linus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox