* irc usage.. @ 2006-05-20 17:26 Linus Torvalds 2006-05-20 17:50 ` Junio C Hamano 2006-05-20 20:39 ` Yann Dirson 0 siblings, 2 replies; 83+ messages in thread From: Linus Torvalds @ 2006-05-20 17:26 UTC (permalink / raw) To: Git Mailing List I hate irc. I'm reading the irc logs, and seeing that people have problems, but (a) it was while I was asleep and (b) irc use doesn't encourage people to actually explain what the problems _are_, so I have no clue. So now I know that "spyderous" has problems importing some 1GB gentoo CVS archive, but that's pretty much it. Grr. Are people afraid to post to git@vger.kernel.org, or what? I saw that people tried to suggest posting to the git mailing list, but can any of you who are active on irc be a bit more forceful? And perhaps we don't make this mailing list address well enough known? As far as I'm aware, the git mailing list isn't closed, so people should be able to post here without even subscribing. I can well understand that you might not want to subscribe and prefer to look ove rthe list through some archive setup (the way I look at the irc logs), and maybe we should just make the git mailing list address more obvious. Right now, the "community" page at http://git.or.cz/community.html doesn't even mention the git mailing list address directly, it just tells you how you can subscribe and read the archives. Can we perhaps fix that, and the people who are active on irc please also make it clear to people that if they have some real problems that don't get an immediate answer, the git mailing list ends up where a lot of people can actually look more closely at it.. And tell them what the address is. Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-20 17:26 irc usage Linus Torvalds @ 2006-05-20 17:50 ` Junio C Hamano 2006-05-20 18:52 ` Jakub Narebski 2006-05-20 20:39 ` Yann Dirson 1 sibling, 1 reply; 83+ messages in thread From: Junio C Hamano @ 2006-05-20 17:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > I hate irc. >... > Can we perhaps fix that, and the people who are active on irc please also > make it clear to people that if they have some real problems that don't > get an immediate answer, the git mailing list ends up where a lot of > people can actually look more closely at it.. And tell them what the > address is. I hate irc, too. Number of times easily solvable usage problems come up and I look at the log to realize when the solutions suggested were waaaaay suboptimal it is too late (with loops being quite active recently things have improved a lot, but we should not expect him to be 24/7). Maybe somebody can run a dumb 'bot that notices somebody said something that ends with a '?' and there is no activity there for N minutes and inject a recorded message that reminds the mailing list address ;-). ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-20 17:50 ` Junio C Hamano @ 2006-05-20 18:52 ` Jakub Narebski 0 siblings, 0 replies; 83+ messages in thread From: Jakub Narebski @ 2006-05-20 18:52 UTC (permalink / raw) To: git Junio C Hamano wrote: > Maybe somebody can run a dumb 'bot that notices somebody said > something that ends with a '?' and there is no activity there > for N minutes and inject a recorded message that reminds the > mailing list address ;-). Or something like fsbot or other bots on #emacs channel http://www.emacswiki.org/cgi-bin/wiki/EmacsChannel -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-20 17:26 irc usage Linus Torvalds 2006-05-20 17:50 ` Junio C Hamano @ 2006-05-20 20:39 ` Yann Dirson 2006-05-20 22:18 ` Donnie Berkholz 2006-05-22 1:45 ` Linus Torvalds 1 sibling, 2 replies; 83+ messages in thread From: Yann Dirson @ 2006-05-20 20:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Sat, May 20, 2006 at 10:26:22AM -0700, Linus Torvalds wrote: > I'm reading the irc logs, and seeing that people have problems, but (a) it > was while I was asleep and (b) irc use doesn't encourage people to > actually explain what the problems _are_, so I have no clue. > > So now I know that "spyderous" has problems importing some 1GB gentoo CVS > archive, but that's pretty much it. Grr. FWIW, I have mentionned a problem that may be the same, under Message-ID <20060107090148.GB32585@nowhere.earth>, that was on January 7th. Namely, when importing a repository with very large files over pserver or ssh, timeouts can occur and prevent the import from working. But, as you said, it's not easy to get precise info from the logs :) Best regards, -- Yann Dirson <ydirson@altern.org> | Debian-related: <dirson@debian.org> | Support Debian GNU/Linux: | Freedom, Power, Stability, Gratis http://ydirson.free.fr/ | Check <http://www.debian.org/> ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-20 20:39 ` Yann Dirson @ 2006-05-20 22:18 ` Donnie Berkholz 2006-05-20 22:45 ` Linus Torvalds 2006-05-21 1:14 ` Donnie Berkholz 2006-05-22 1:45 ` Linus Torvalds 1 sibling, 2 replies; 83+ messages in thread From: Donnie Berkholz @ 2006-05-20 22:18 UTC (permalink / raw) To: Yann Dirson; +Cc: Linus Torvalds, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 1725 bytes --] Yann Dirson wrote: > On Sat, May 20, 2006 at 10:26:22AM -0700, Linus Torvalds wrote: >> I'm reading the irc logs, and seeing that people have problems, but (a) it >> was while I was asleep and (b) irc use doesn't encourage people to >> actually explain what the problems _are_, so I have no clue. >> >> So now I know that "spyderous" has problems importing some 1GB gentoo CVS >> archive, but that's pretty much it. Grr. Hi all, I just subscribed and this post is the only one I've got from the thread, so I'm responding to it instead of the original. Gentoo's an IRC-based community, so I tend to try IRC first for any problems I have and fall back to the list later if I can't get things figured out. Here's a rough summary: Our main repo is actually a bit over 2G (2103621223) now that I check, but it's not very complex. There's actually just one branch, and I don't think anyone would care if we lost the history from it because it's a release branch from a few years ago. Somebody else tried importing it with git-cvsimport, but he said he hit some kind of problem and recalled that it was a cvsps segfault. Sounds about right, since I've never gotten cvsps to run successfully on the whole repo either. I tried with parsecvs, but it runs into OOM even on a machine with 4G RAM after reading in all the ,v files, presumably while it's building some huge tree of changesets in memory. Keith Packard's suggested that there are ways to reduce parsecvs's memory use, because it retains the full tree in memory for each revision rather than just the files that actually changed. But my C skills are pretty weak; I'm an OK reader but not much of a writer yet. Thanks, Donnie [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-20 22:18 ` Donnie Berkholz @ 2006-05-20 22:45 ` Linus Torvalds 2006-05-20 23:12 ` Donnie Berkholz 2006-05-21 9:46 ` Thomas Glanzmann 2006-05-21 1:14 ` Donnie Berkholz 1 sibling, 2 replies; 83+ messages in thread From: Linus Torvalds @ 2006-05-20 22:45 UTC (permalink / raw) To: Donnie Berkholz; +Cc: Yann Dirson, Git Mailing List On Sat, 20 May 2006, Donnie Berkholz wrote: > > Our main repo is actually a bit over 2G (2103621223) now that I check, > but it's not very complex. There's actually just one branch, and I don't > think anyone would care if we lost the history from it because it's a > release branch from a few years ago. Can you point to it? I'm not a CVS user, but I've played with cvsps before (to get it to work), and I'm a humanitarian - rescuing people from CVS is to me not just a good idea, it's a moral imperative. Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-20 22:45 ` Linus Torvalds @ 2006-05-20 23:12 ` Donnie Berkholz 2006-05-21 19:24 ` Linus Torvalds 2006-05-21 9:46 ` Thomas Glanzmann 1 sibling, 1 reply; 83+ messages in thread From: Donnie Berkholz @ 2006-05-20 23:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: Yann Dirson, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 826 bytes --] Linus Torvalds wrote: > > On Sat, 20 May 2006, Donnie Berkholz wrote: >> Our main repo is actually a bit over 2G (2103621223) now that I check, >> but it's not very complex. There's actually just one branch, and I don't >> think anyone would care if we lost the history from it because it's a >> release branch from a few years ago. > > Can you point to it? I'm not a CVS user, but I've played with cvsps before > (to get it to work), and I'm a humanitarian - rescuing people from CVS is > to me not just a good idea, it's a moral imperative. I don't want to post the link publicly for a few reasons, including the huge amount of bandwidth it would suck up for lots of people to download it. I've sent it to you off-list, and if anyone else would also like it, please drop me a note. Thanks, Donnie [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-20 23:12 ` Donnie Berkholz @ 2006-05-21 19:24 ` Linus Torvalds 2006-05-22 3:59 ` Linus Torvalds 0 siblings, 1 reply; 83+ messages in thread From: Linus Torvalds @ 2006-05-21 19:24 UTC (permalink / raw) To: Donnie Berkholz; +Cc: Yann Dirson, Git Mailing List On Sat, 20 May 2006, Donnie Berkholz wrote: > > I don't want to post the link publicly for a few reasons, including the > huge amount of bandwidth it would suck up for lots of people to download > it. I've sent it to you off-list, and if anyone else would also like it, > please drop me a note. Ok. It's still converting (that's a big archive), but it has passed the cvsps stage without errors for me, and the conversion so far seems ok. But it has only gotten to Author: vapier <vapier> 2002-09-23 12:32:42 Changed GPL to GPL-2 in LICENSE and updated SRC_URI to use mirror: so it has converted only slightly more than the first two years of history in the roughly 30 minutes I've let it run. So it will take several hours. The reason it works for me is likely simply the fact that I had a few patches to my cvsps already. I'm appending the stupid patches, I'm not guaranteeing that they are correct at all, although the three _committed_ patches are almost certainly correct (and the last uncommitted one is almost certainly totally broken). The patches are against clean cvsps 2.1. Also, when I say "the conversion so far seems ok", I obviously don't actually know what the hell the archive is supposed to look like, so I can only say that the end result seems not totally insane. To do a good conversion, you'll want to make sure that you have a author name conversion file. See the "-A" flag in "git help cvsimport" (if you have the man-pages installed). Linus --- commit 534120d9a47062eecd7b53fd7ac0b70d97feb4fd Author: Linus Torvalds <torvalds@g5.osdl.org> Date: Wed Mar 22 11:20:59 2006 -0800 Increase log-length limit to 64kB Yeah, it should be dynamic. I'm lazy. --- cvsps_types.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/cvsps_types.h b/cvsps_types.h index b41e2a9..dba145d 100644 --- a/cvsps_types.h +++ b/cvsps_types.h @@ -8,7 +8,7 @@ #define CVSPS_TYPES_H #include <time.h> -#define LOG_STR_MAX 32768 +#define LOG_STR_MAX 65536 #define AUTH_STR_MAX 64 #define REV_STR_MAX 64 #define MIN(a, b) ((a) < (b) ? (a) : (b)) commit 82fcf7e31bbeae3b01a8656549e9b8fd89d598eb Author: Linus Torvalds <torvalds@g5.osdl.org> Date: Wed Mar 22 11:23:37 2006 -0800 Improve handling of file collisions in the same patchset Take the file revision into account. --- cvsps.c | 27 +++++++++++++++++++++++++-- 1 files changed, 25 insertions(+), 2 deletions(-) diff --git a/cvsps.c b/cvsps.c index 1e64e3c..c22147e 100644 --- a/cvsps.c +++ b/cvsps.c @@ -2384,8 +2384,31 @@ void patch_set_add_member(PatchSet * ps, for (next = ps->members.next; next != &ps->members; next = next->next) { PatchSetMember * m = list_entry(next, PatchSetMember, link); - if (m->file == psm->file && ps->collision_link.next == NULL) - list_add(&ps->collision_link, &collisions); + if (m->file == psm->file) { + int order = compare_rev_strings(psm->post_rev->rev, m->post_rev->rev); + + /* + * Same revision too? Add it to the collision list + * if it isn't already. + */ + if (!order) { + if (ps->collision_link.next == NULL) + list_add(&ps->collision_link, &collisions); + return; + } + + /* + * If this is an older revision than the one we already have + * in this patchset, just ignore it + */ + if (order < 0) + return; + + /* + * This is a newer one, remove the old one + */ + list_del(&m->link); + } } psm->ps = ps; commit 3d1ebcef6b4f9f6c9064efd64da4dd30d93c3c96 Author: Linus Torvalds <torvalds@g5.osdl.org> Date: Wed Mar 22 17:20:20 2006 -0800 Fix branch ancestor calculation Not having any ancestor at all means that any valid ancestor (even of "depth 0") is fine. Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- cvsps.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/cvsps.c b/cvsps.c index c22147e..2695a0f 100644 --- a/cvsps.c +++ b/cvsps.c @@ -2599,7 +2599,7 @@ static void determine_branch_ancestor(Pa * note: rev is the pre-commit revision, not the post-commit */ if (!head_ps->ancestor_branch) - d1 = 0; + d1 = -1; else if (strcmp(ps->branch, rev->branch) == 0) continue; else if (strcmp(head_ps->ancestor_branch, "HEAD") == 0) uncommitted diff Author: Linus Torvalds <torvalds@g5.osdl.org> Probably totally broken dot counting --- cvsps.c | 13 ++++++++++--- 1 files changed, 10 insertions(+), 3 deletions(-) diff --git a/cvsps.c b/cvsps.c index 2695a0f..2ad1595 100644 --- a/cvsps.c +++ b/cvsps.c @@ -2357,9 +2357,16 @@ static int revision_affects_branch(CvsFi static int count_dots(const char * p) { int dots = 0; + int len = strlen(p); - while (*p) - if (*p++ == '.') + while (len > 2) { + if (memcmp(p+len-2, ".1", 2)) + break; + len -= 2; + } + + while (len) + if (p[--len] == '.') dots++; return dots; @@ -2613,7 +2620,7 @@ static void determine_branch_ancestor(Pa /* HACK: we sometimes pretend to derive from the import branch. * just don't do that. this is the easiest way to prevent... */ - d2 = (strcmp(rev->rev, "1.1.1.1") == 0) ? 0 : count_dots(rev->rev); + d2 = count_dots(rev->rev); if (d2 > d1) head_ps->ancestor_branch = rev->branch; ^ permalink raw reply related [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-21 19:24 ` Linus Torvalds @ 2006-05-22 3:59 ` Linus Torvalds 2006-05-22 4:19 ` Donnie Berkholz 0 siblings, 1 reply; 83+ messages in thread From: Linus Torvalds @ 2006-05-22 3:59 UTC (permalink / raw) To: Donnie Berkholz; +Cc: Yann Dirson, Git Mailing List On Sun, 21 May 2006, Linus Torvalds wrote: > > Ok. It's still converting (that's a big archive), but it has passed the > cvsps stage without errors for me, and the conversion so far seems ok. But > it has only gotten to > > Author: vapier <vapier> 2002-09-23 12:32:42 > Changed GPL to GPL-2 in LICENSE and updated SRC_URI to use mirror: > > so it has converted only slightly more than the first two years of > history in the roughly 30 minutes I've let it run. So it will take several > hours. Btw, trying this import (which got interrupted by a thunderstorm and one of our first power failures in a long time - just a few seconds, but enough to power off everything but my laptops) it became very obvious that "git cvsimport" really _really_ should re-pack the archive every once in a while. The old "repack every month or so" approach doesn't work that well when you try to import several years of history in a few hours. Now, you can just repack after the whole thing is done (it will probably take no more than ~15 minutes or so), but it would probably be best if the import script itself decided to repack every once in a while just to avoid wasting a lot of diskspace _during_ the import itself. So this isn't so much a correctness issue as a "avoid wasting time and space" issue, but still.. Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 3:59 ` Linus Torvalds @ 2006-05-22 4:19 ` Donnie Berkholz 2006-05-22 4:50 ` Linus Torvalds 0 siblings, 1 reply; 83+ messages in thread From: Donnie Berkholz @ 2006-05-22 4:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: Yann Dirson, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 2416 bytes --] Linus Torvalds wrote: > > On Sun, 21 May 2006, Linus Torvalds wrote: >> Ok. It's still converting (that's a big archive), but it has passed the >> cvsps stage without errors for me, and the conversion so far seems ok. But >> it has only gotten to >> >> Author: vapier <vapier> 2002-09-23 12:32:42 >> Changed GPL to GPL-2 in LICENSE and updated SRC_URI to use mirror: >> >> so it has converted only slightly more than the first two years of >> history in the roughly 30 minutes I've let it run. So it will take several >> hours. > > Btw, trying this import (which got interrupted by a thunderstorm and one > of our first power failures in a long time - just a few seconds, but > enough to power off everything but my laptops) it became very obvious that > "git cvsimport" really _really_ should re-pack the archive every once in a > while. Fortunately the storms haven't been that bad down in Corvallis. cvsps also worked fine for me, but git-cvsimport broke in the middle. The command I'm using is 'git-cvsimport -P ../gentoo.cvsps -k -d /media/scm_comparison -A ~/dev/Authors -v gentoo-x86 | tee cvsimport.log' Here's the last bits: Fetching gnome-base/gnome-applets/gnome-applets-1.4.0.4-r1.ebuild v 1.5 Update gnome-base/gnome-applets/gnome-applets-1.4.0.4-r1.ebuild: 947 bytes Fetching gnome-base/gnome-applets/gnome-applets-1.4.0.4-r2.ebuild v 1.3 Update gnome-base/gnome-applets/gnome-applets-1.4.0.4-r2.ebuild: 977 bytes Fetching gnome-base/gnome-applets/gnome-applets-2.0.0-r1.ebuild v 1.2 Update gnome-base/gnome-applets/gnome-applets-2.0.0-r1.ebuild: 2704 bytes Fetching gnome-base/gnome-applets/gnome-applets-2.0.0.ebuild v 1.2 Update gnome-base/gnome-applets/gnome-applets-2.0.0.ebuild: 3031 bytes Tree ID 4d19a84efce2de9cfb42ac0397e0036bbed2ad65 Parent ID ecb78bbe30369a76e2599d0d17de8fe922dca211 Committed patch 14615 (origin 2002-07-16 20:13:15) Commit ID 4dd2179e0c1369e07cd268fb5c8b150c3a2a1094 Delete net-fs/openafs/openafs-1.2.2-r6.ebuild Delete net-fs/openafs/files/digest-openafs-1.2.2-r6 Tree ID bfc7320883983655d7d2ea2c6d04f85b45365ce1 Parent ID 4dd2179e0c1369e07cd268fb5c8b150c3a2a1094 Committed patch 14616 (origin 2002-07-16 20:15:15) Commit ID 7a36de9c4c9b93337ed789ae2341cad3d0991c6d Unknown: error Cannot allocate memory Fetching profiles/package.mask v 1.992 cat: write error: Broken pipe Thanks, Donnie [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 4:19 ` Donnie Berkholz @ 2006-05-22 4:50 ` Linus Torvalds 2006-05-22 5:04 ` Martin Langhoff ` (2 more replies) 0 siblings, 3 replies; 83+ messages in thread From: Linus Torvalds @ 2006-05-22 4:50 UTC (permalink / raw) To: Donnie Berkholz Cc: Yann Dirson, Git Mailing List, Matthias Urlichs, Martin Langhoff, Johannes Schindelin On Sun, 21 May 2006, Donnie Berkholz wrote: > > Fortunately the storms haven't been that bad down in Corvallis. cvsps > also worked fine for me, but git-cvsimport broke in the middle. Hmm. It's actually possible that it did that for me too - I had put the cvsimport in an xterm and forgotten about it, and just assumed that the power failure was what broke it. But maybe it had broken down before that happened - I just don't have any logs left ;) > Here's the last bits: > > [ snip snip ] > Commit ID 7a36de9c4c9b93337ed789ae2341cad3d0991c6d > Unknown: error Cannot allocate memory > Fetching profiles/package.mask v 1.992 > cat: write error: Broken pipe Hmm. I don't actually know perl, and my original "cvsimport" script was actually this funny C program that generated a shell script to do the import. That worked fine, and had no memory leaks, but it was a truly hacky thing of horrible beauty. Or rather, it _would_ have been that, if it had had any beauty to be horrible about. But at least I would have been able to debug it. But the perl one I can't parse any more. That said, the whole "Unknown:" printout seems to come from the subroutine "_line()", which just reads a line from the cvs server. Did you do a "top" at any time just before this all happened? It _sounds_ like it might actually be a memory leak on the CVS server side, and the problem may (or may not) be due to the optimization that keeps a single long-running CVS server instance for the whole process. I wouldn't be in the least surprised if that ends up triggering a slow leak in CVS itself, and then CVS runs out of memory. That would likely have been obvious in any "top" output just before the failure. Smurf, Martin, Dscho.. Any ideas? My old script just ran RCS directly on the files, and had no issues like that. I'll happily admit that my old script generator thing was horrible, but it was a lot easier to debug than the smarter perl script that uses a CVS server connection.. Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 4:50 ` Linus Torvalds @ 2006-05-22 5:04 ` Martin Langhoff 2006-05-22 5:21 ` Donnie Berkholz 2006-05-22 7:42 ` Martin Langhoff 2 siblings, 0 replies; 83+ messages in thread From: Martin Langhoff @ 2006-05-22 5:04 UTC (permalink / raw) To: Linus Torvalds Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs, Martin Langhoff, Johannes Schindelin On 5/22/06, Linus Torvalds <torvalds@osdl.org> wrote: > I wouldn't be in the least surprised if that ends up triggering a slow > leak in CVS itself, and then CVS runs out of memory. I'm dying to try this out myself after work. I don't discard that cvsimport might be stuffing data in an array that grows forever. In any case you'll hear from me soon. martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 4:50 ` Linus Torvalds 2006-05-22 5:04 ` Martin Langhoff @ 2006-05-22 5:21 ` Donnie Berkholz 2006-05-22 7:42 ` Martin Langhoff 2 siblings, 0 replies; 83+ messages in thread From: Donnie Berkholz @ 2006-05-22 5:21 UTC (permalink / raw) To: Linus Torvalds Cc: Yann Dirson, Git Mailing List, Matthias Urlichs, Martin Langhoff, Johannes Schindelin [-- Attachment #1: Type: text/plain, Size: 451 bytes --] Linus Torvalds wrote: > Did you do a "top" at any time just before this all happened? It _sounds_ > like it might actually be a memory leak on the CVS server side, and the > problem may (or may not) be due to the optimization that keeps a single > long-running CVS server instance for the whole process. No. =\ I just started the thing running in a screen session and came back a few hours later to find it like that. Thanks, Donnie [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 4:50 ` Linus Torvalds 2006-05-22 5:04 ` Martin Langhoff 2006-05-22 5:21 ` Donnie Berkholz @ 2006-05-22 7:42 ` Martin Langhoff 2006-05-22 9:13 ` Linus Torvalds 2 siblings, 1 reply; 83+ messages in thread From: Martin Langhoff @ 2006-05-22 7:42 UTC (permalink / raw) To: Linus Torvalds Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On 5/22/06, Linus Torvalds <torvalds@osdl.org> wrote: > Did you do a "top" at any time just before this all happened? It _sounds_ > like it might actually be a memory leak on the CVS server side, and the > problem may (or may not) be due to the optimization that keeps a single > long-running CVS server instance for the whole process. Running a few tests right now. Looks like cvs (Debian/etch 1.12.9-13) itself is not leaking any memory. The Perl (Debian/etch 5.8.7-something and now 5.8.8-4) process OTOH is visibly allocating memory. Starts off at 4MB and gets up to ~17MB by the time it has done 6K commits. I am trying to figure out whether the leak is in the script or in the Perl implementation, using PadWalk, Devel::Leak and friends. If the leak is here, I can't see it (yet). > I wouldn't be in the least surprised if that ends up triggering a slow > leak in CVS itself, and then CVS runs out of memory. Or a slow leak in Perl? The 5.8.8 release notes do talk about some leaks being fixed, but this 5.8.8 isn't making a difference. Working on it. martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 7:42 ` Martin Langhoff @ 2006-05-22 9:13 ` Linus Torvalds 2006-05-22 12:54 ` Martin Langhoff 0 siblings, 1 reply; 83+ messages in thread From: Linus Torvalds @ 2006-05-22 9:13 UTC (permalink / raw) To: Martin Langhoff Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On Mon, 22 May 2006, Martin Langhoff wrote: > > Or a slow leak in Perl? The 5.8.8 release notes do talk about some > leaks being fixed, but this 5.8.8 isn't making a difference. > > Working on it. Thanks. Looking at what I did convert, that horrid gentoo CVS tree is interesting. The resulting (partial) git history has 93413 commits and 850,000+ objects total, all in a totally linear history. And that's just up to April 2004, so the full tree is probably a million objects. The good news is that git seems to handle that size repo no problem at all. The repack did indeed take a long while, but it packed it all down to a 189MB pack-file (and 20MB pack index). Considering that the bzip2'd tar-file of the CVS history was 157MB, and the actual CVS footprint was about 1.6GB, if git stays at under a quarter gigabyte for the whole archive once converted (which sounds likely, counting indexing), git would basically cut down the disk usage for a live repo by a factor of 7 or so. _And_ I can do a "git log origin > /dev/null" in about 2.4 seconds. Take that, CVS. Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 9:13 ` Linus Torvalds @ 2006-05-22 12:54 ` Martin Langhoff 2006-05-22 17:27 ` Linus Torvalds 2006-05-22 19:09 ` Donnie Berkholz 0 siblings, 2 replies; 83+ messages in thread From: Martin Langhoff @ 2006-05-22 12:54 UTC (permalink / raw) To: Linus Torvalds Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On 5/22/06, Linus Torvalds <torvalds@osdl.org> wrote: > On Mon, 22 May 2006, Martin Langhoff wrote: > > > > Or a slow leak in Perl? The 5.8.8 release notes do talk about some > > leaks being fixed, but this 5.8.8 isn't making a difference. > > > > Working on it. > > Thanks. Looking at what I did convert, that horrid gentoo CVS tree is > interesting. The resulting (partial) git history has 93413 commits and > 850,000+ objects total, all in a totally linear history. Ok, so there's 3 patches posted that should help narrow down the problem. There's a new -L <imit> so that Donnie can get his stuff done by running it in a while(true) loop. Not proud of it, but hey. And there are two patches that I suspect may fix the leak. After applying them, the cvsimport process grows up to ~13MB and then tapers off, at least as far as my patience has gotten me. It's late on this side of the globe so I'll look at the results tomorrow morning. (BTW, I typo-ed Linus' address in the git-send-email invocation. Will resend to him separately) I'll also prep a patch as Linus suggests to do auto-repacking while the import runs so we don't eat up the harddisk. > git would basically cut down the disk usage for a live > repo by a factor of 7 or so. > > _And_ I can do a "git log origin > /dev/null" in about 2.4 seconds. Take > that, CVS. Heh. Faster Gitticat, Kill Kill Kill! martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 12:54 ` Martin Langhoff @ 2006-05-22 17:27 ` Linus Torvalds 2006-05-22 17:51 ` Jakub Narebski 2006-05-22 19:46 ` Martin Langhoff 2006-05-22 19:09 ` Donnie Berkholz 1 sibling, 2 replies; 83+ messages in thread From: Linus Torvalds @ 2006-05-22 17:27 UTC (permalink / raw) To: Martin Langhoff Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On Tue, 23 May 2006, Martin Langhoff wrote: > > And there are two patches that I suspect may fix the leak. After > applying them, the cvsimport process grows up to ~13MB and then tapers > off, at least as far as my patience has gotten me. It's late on this > side of the globe so I'll look at the results tomorrow morning. Ok, initial results are promising. git-cvsimport appears to be still slowly growing, but it's at 40M (ie pretty tiny, considering that cvsps grew to 800+MB on this archive) and growth seems to actually be slowing. My conversion is only up to September 2002, but if it doesn't suddenly hit some huge growth spurt, I wouldn't expect it to run out of memory. The CVS server process itself is tiny, and doesn't seem to grow at all. As to packing, it doing something like while : do sleep 30 # # repack roughly every 25600 objects # n=$(ls .git/objects/00 2> /dev/null | wc -l) if [ $n -gt 100 ]; then git repack -a # # Stupid sleep to make sure that nobody is still # using any unpacked objects after the pack got # generated # sleep 10 git prune-packed fi done or similar (the above is totally untested - I've just done it by hand a few times) should work. It's perfectly ok to repack the archive even while the cvsimport script is adding more data and changing it. Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 17:27 ` Linus Torvalds @ 2006-05-22 17:51 ` Jakub Narebski 2006-05-22 18:03 ` Linus Torvalds 2006-05-22 19:46 ` Martin Langhoff 1 sibling, 1 reply; 83+ messages in thread From: Jakub Narebski @ 2006-05-22 17:51 UTC (permalink / raw) To: git Linus Torvalds wrote: > git repack -a > # > # Stupid sleep to make sure that nobody is still > # using any unpacked objects after the pack got > # generated > # > sleep 10 > git prune-packed Is it really necessary (on Linux at least)? Git boast it's atomicity... -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 17:51 ` Jakub Narebski @ 2006-05-22 18:03 ` Linus Torvalds 2006-05-22 19:03 ` Matthias Lederhofer 2006-05-23 20:19 ` Jakub Narebski 0 siblings, 2 replies; 83+ messages in thread From: Linus Torvalds @ 2006-05-22 18:03 UTC (permalink / raw) To: Jakub Narebski; +Cc: git On Mon, 22 May 2006, Jakub Narebski wrote: > > Linus Torvalds wrote: > > > git repack -a > > # > > # Stupid sleep to make sure that nobody is still > > # using any unpacked objects after the pack got > > # generated > > # > > sleep 10 > > git prune-packed > > Is it really necessary (on Linux at least)? Git boast it's atomicity... I don't think it's necessary in practice. But people _should_ realize that removing objects is very very special. Whether it's done by "git prune-packed" or "git prune", that's a very dangerous operations. "git prune" a lot more so than "git prune-packed", of course (in fact, you should _never_ run "git prune" on a repository that is active - you _will_ corrupt it)- Doing "git prune-packed" _should_ be mostly safe on UNIX, since the objects all exist in packs, and anybody who already opened an object will keep the fd open, and not even notice that the name is gone. However, there is at least one race: object lookup "git repack -a -d" ============= ================== - a process does its object database setup. No new pack-file yet. - mv tmp-packfile active-packfile - git prune-packed - the process looks up the object, and doesn't look in the pack-file because it didn't see the pack-file. So it tries to look up an object, fails, and errors out. It's not a fatal error (just re-try) but it could break something like a cvsimport Now, in PRACTICE, I doubt you'd ever hit this. But the fact is, pruning your repository (whether prune-packed or a full prune) is _the_ special operation. It's something that removes a filesystem representation of an object that is otherwise immutable. Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 18:03 ` Linus Torvalds @ 2006-05-22 19:03 ` Matthias Lederhofer 2006-05-22 19:09 ` Junio C Hamano 2006-05-23 20:19 ` Jakub Narebski 1 sibling, 1 reply; 83+ messages in thread From: Matthias Lederhofer @ 2006-05-22 19:03 UTC (permalink / raw) To: git > But people _should_ realize that removing objects is very very special. Just a similar question: is there any reason not tu run git repack/prune-packed as cron job? I would think of something like this for every night: - git prune-packed (remove objects packed last time) - check how many objects git-count-objects counts, if it are not enough abort - git repack git repack -a -d is probably a bad idea, I guess, because a program could try to open them after they were deleted. Is there any way to delete unnecessary packs (those which would repack -a -d delete)? Making it possible to do a git repack -a and delete those packs the next night? ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 19:03 ` Matthias Lederhofer @ 2006-05-22 19:09 ` Junio C Hamano 0 siblings, 0 replies; 83+ messages in thread From: Junio C Hamano @ 2006-05-22 19:09 UTC (permalink / raw) To: Matthias Lederhofer; +Cc: git Matthias Lederhofer <matled@gmx.net> writes: > ... Is there any way to > delete unnecessary packs (those which would repack -a -d delete)? > Making it possible to do a git repack -a and delete those packs the > next night? pack-redundant is supposed to figure it out, but I have never used it myself so your mileage may vary. ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 18:03 ` Linus Torvalds 2006-05-22 19:03 ` Matthias Lederhofer @ 2006-05-23 20:19 ` Jakub Narebski 1 sibling, 0 replies; 83+ messages in thread From: Jakub Narebski @ 2006-05-23 20:19 UTC (permalink / raw) To: git Linus Torvalds wrote: > [...] people _should_ realize that removing objects is very very special. > Whether it's done by "git prune-packed" or "git prune", that's a very > dangerous operations. "git prune" a lot more so than "git prune-packed", > of course (in fact, you should _never_ run "git prune" on a repository > that is active - you _will_ corrupt it)- Would it be possible to make 'git prune' command repository corruption safe, even if some information might be lost (like 'git add')? Or do _corruption_ mean some recoverable only information is lost? Not always one can use "one repository per developer" workflow. One of the solution would be to to use reader/writer lock (filesystem semaphore), with each command modyfying repository performing locking, and git-prune waiting on lock until noone is accessing repository. Of course the problem is with OS and filesystems which does not support locking, and with stale locks... Second solution would be to [optionally] wait until no process is accessing repository, copy repository in some safe place, [optionally] calculate checksum, prune, [optionally] check if the repository was modified meanwhile and either abort or repeat, and finally copy pruned repository back. -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 17:27 ` Linus Torvalds 2006-05-22 17:51 ` Jakub Narebski @ 2006-05-22 19:46 ` Martin Langhoff 1 sibling, 0 replies; 83+ messages in thread From: Martin Langhoff @ 2006-05-22 19:46 UTC (permalink / raw) To: Linus Torvalds Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On 5/23/06, Linus Torvalds <torvalds@osdl.org> wrote: > Ok, initial results are promising. git-cvsimport appears to be still > slowly growing, but it's at 40M (ie pretty tiny, considering that cvsps > grew to 800+MB on this archive) and growth seems to actually be slowing. That's great news. The cvs archive seems to have large commits every once in a while, so I suspect the residual memory growth may be related to those. Or to a smaller leak I haven't nailed. My test box is bloody slow it seems. I'll try and get hold of a faster machine to run this if I can. > As to packing, it doing something like Given that we are running batch, it is safe and simple to stop the import, repack, prune-packed, and keep going. Don't think we'll win any races by running it in parallel ;-) cheers, martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 12:54 ` Martin Langhoff 2006-05-22 17:27 ` Linus Torvalds @ 2006-05-22 19:09 ` Donnie Berkholz 2006-05-22 19:38 ` Linus Torvalds ` (2 more replies) 1 sibling, 3 replies; 83+ messages in thread From: Donnie Berkholz @ 2006-05-22 19:09 UTC (permalink / raw) To: Martin Langhoff Cc: Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin [-- Attachment #1: Type: text/plain, Size: 1530 bytes --] Martin Langhoff wrote: > On 5/22/06, Linus Torvalds <torvalds@osdl.org> wrote: >> On Mon, 22 May 2006, Martin Langhoff wrote: >> > >> > Or a slow leak in Perl? The 5.8.8 release notes do talk about some >> > leaks being fixed, but this 5.8.8 isn't making a difference. >> > >> > Working on it. >> >> Thanks. Looking at what I did convert, that horrid gentoo CVS tree is >> interesting. The resulting (partial) git history has 93413 commits and >> 850,000+ objects total, all in a totally linear history. > > Ok, so there's 3 patches posted that should help narrow down the > problem. There's a new -L <imit> so that Donnie can get his stuff done > by running it in a while(true) loop. Not proud of it, but hey. > > And there are two patches that I suspect may fix the leak. After > applying them, the cvsimport process grows up to ~13MB and then tapers > off, at least as far as my patience has gotten me. It's late on this > side of the globe so I'll look at the results tomorrow morning. OK, I started a new run without -L, and I'm watching it in top right now. The cvsimport seems to be doing alright, but the cvs server process sucks about another megabyte of virtual every 4-5 seconds. This is a bit concerning since I don't have any swap. Shortly after it hit 670M, I got "Cannot allocate memory" again. I've got a gig of RAM, and around 300M was resident in various processes at the time. So it seems the problem is in cvs itself. I will try another run with -L now. Thanks, Donnie [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 19:09 ` Donnie Berkholz @ 2006-05-22 19:38 ` Linus Torvalds 2006-05-22 19:49 ` Donnie Berkholz 2006-05-22 19:41 ` Martin Langhoff 2006-05-22 20:16 ` irc usage Donnie Berkholz 2 siblings, 1 reply; 83+ messages in thread From: Linus Torvalds @ 2006-05-22 19:38 UTC (permalink / raw) To: Donnie Berkholz Cc: Martin Langhoff, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On Mon, 22 May 2006, Donnie Berkholz wrote: > > OK, I started a new run without -L, and I'm watching it in top right > now. The cvsimport seems to be doing alright, but the cvs server process > sucks about another megabyte of virtual every 4-5 seconds. This is a bit > concerning since I don't have any swap. Shortly after it hit 670M, I got > "Cannot allocate memory" again. I've got a gig of RAM, and around 300M > was resident in various processes at the time. Hmm. My cvs server doesn't really grow at all. It's at 13M RSS. What version of cvs are you running? [torvalds@g5 ~]$ cvs --version Concurrent Versions System (CVS) 1.11.21 (client/server) maybe that matters. (but my import is only up to Jun 22, 2003 so far). Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 19:38 ` Linus Torvalds @ 2006-05-22 19:49 ` Donnie Berkholz 2006-05-22 20:20 ` Linus Torvalds 0 siblings, 1 reply; 83+ messages in thread From: Donnie Berkholz @ 2006-05-22 19:49 UTC (permalink / raw) To: Linus Torvalds Cc: Martin Langhoff, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin [-- Attachment #1: Type: text/plain, Size: 586 bytes --] Linus Torvalds wrote: > Hmm. My cvs server doesn't really grow at all. It's at 13M RSS. Yeah, that's the thing. RSS stayed about the same (according to top), but virtual just kept growing. > What version of cvs are you running? > > [torvalds@g5 ~]$ cvs --version > > Concurrent Versions System (CVS) 1.11.21 (client/server) Concurrent Versions System (CVS) 1.12.12 (client/server) Looks like there's a .13 out but the zlib interaction is badly broken (-z >=1) so my system didn't get upgraded. I'll try it anyway after the -L run finishes. Thanks, Donnie [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 19:49 ` Donnie Berkholz @ 2006-05-22 20:20 ` Linus Torvalds 2006-05-22 21:48 ` Donnie Berkholz 0 siblings, 1 reply; 83+ messages in thread From: Linus Torvalds @ 2006-05-22 20:20 UTC (permalink / raw) To: Donnie Berkholz Cc: Martin Langhoff, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On Mon, 22 May 2006, Donnie Berkholz wrote: > > Linus Torvalds wrote: > > Hmm. My cvs server doesn't really grow at all. It's at 13M RSS. > > Yeah, that's the thing. RSS stayed about the same (according to top), > but virtual just kept growing. Not for me. The virtual size is certainly bigger than RSS, but not by a huge amount. So this might be a regression in CVS, since you seem to have a newer version than I do. The latest stable CVS release is 1.11.21, I think: you seem to be running the "development" version (1.12.x). Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 20:20 ` Linus Torvalds @ 2006-05-22 21:48 ` Donnie Berkholz 2006-05-29 21:54 ` Donnie Berkholz 0 siblings, 1 reply; 83+ messages in thread From: Donnie Berkholz @ 2006-05-22 21:48 UTC (permalink / raw) To: Linus Torvalds Cc: Martin Langhoff, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin [-- Attachment #1: Type: text/plain, Size: 233 bytes --] Linus Torvalds wrote: > The latest stable CVS release is 1.11.21, I think: you seem to be running > the "development" version (1.12.x). Backed down to the 1.11 series, things seem to be going fine so far. Thanks, Donnie [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 21:48 ` Donnie Berkholz @ 2006-05-29 21:54 ` Donnie Berkholz 2006-05-29 22:21 ` Martin Langhoff 0 siblings, 1 reply; 83+ messages in thread From: Donnie Berkholz @ 2006-05-29 21:54 UTC (permalink / raw) To: Donnie Berkholz Cc: Linus Torvalds, Martin Langhoff, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin [-- Attachment #1: Type: text/plain, Size: 406 bytes --] Donnie Berkholz wrote: > Linus Torvalds wrote: >> The latest stable CVS release is 1.11.21, I think: you seem to be running >> the "development" version (1.12.x). > > Backed down to the 1.11 series, things seem to be going fine so far. Finally hit an OOM sometime in the past day (yep, a week later) =\. Not sure whether it was cvsimport or cvs. Anyone else had more luck? Thanks, Donnie [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-29 21:54 ` Donnie Berkholz @ 2006-05-29 22:21 ` Martin Langhoff 2006-05-29 22:32 ` Donnie Berkholz 0 siblings, 1 reply; 83+ messages in thread From: Martin Langhoff @ 2006-05-29 22:21 UTC (permalink / raw) To: Donnie Berkholz Cc: Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On 5/30/06, Donnie Berkholz <spyderous@gentoo.org> wrote: > Donnie Berkholz wrote: > > Linus Torvalds wrote: > >> The latest stable CVS release is 1.11.21, I think: you seem to be running > >> the "development" version (1.12.x). > > > > Backed down to the 1.11 series, things seem to be going fine so far. > > Finally hit an OOM sometime in the past day (yep, a week later) =\. Not > sure whether it was cvsimport or cvs. Anyone else had more luck? It seemed like it had finished on the machine I was running it, and I assumed it was alright in yours too. Looking closer it only made it till April 2004 -- but it may have been killed by a sysadmin, the captured log talks about 'signal 9', I have no idea what the OOM sends. It had done 285070 of 343822 patchsets. Have you dropped the -a from the git-repack invocation? That should help. Try also Linus' patch for git-rev-list. The other thing hurting us is that the commits are _huge_. I wonder how you guys were managing this with CVS. Now _this_ explains why cvsimport grows humongous. I'll try to rework the commit loop so that we don't need to hold all the filenames in memory. It seems to be choking with the commits after April 2004. But that will have to wait till tonight. cheers, martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-29 22:21 ` Martin Langhoff @ 2006-05-29 22:32 ` Donnie Berkholz 2006-05-30 0:19 ` Martin Langhoff ` (2 more replies) 0 siblings, 3 replies; 83+ messages in thread From: Donnie Berkholz @ 2006-05-29 22:32 UTC (permalink / raw) To: Martin Langhoff Cc: Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin [-- Attachment #1: Type: text/plain, Size: 2375 bytes --] Martin Langhoff wrote: > On 5/30/06, Donnie Berkholz <spyderous@gentoo.org> wrote: >> Finally hit an OOM sometime in the past day (yep, a week later) =\. Not >> sure whether it was cvsimport or cvs. Anyone else had more luck? > > It seemed like it had finished on the machine I was running it, and I > assumed it was alright in yours too. Looking closer it only made it > till April 2004 -- but it may have been killed by a sysadmin, the > captured log talks about 'signal 9', I have no idea what the OOM > sends. Looking closer, I see that the memory suckers do appear to be git, from dmesg: Out of Memory: Kill process 17230 (git-repack) score 97207 and children. Out of memory: Killed process 17231 (git-rev-list). Just ends like this: Tree ID 2cc632e5e1d3a430a2cc891bf33c4a12f19a4d0e Parent ID ad92d7073a52458e0581633bbd8ccbbec838d9e6 Committed patch 249100 (origin 2005-08-20 05:05:58) Commit ID 28941f00d714f57ab49f1fd725d1c3ce8a5d0b93 Fetching sys-kernel/ck-sources/ChangeLog v 1.113 Update sys-kernel/ck-sources/ChangeLog: 25425 bytes Fetching sys-kernel/ck-sources/Manifest v 1.164 Update sys-kernel/ck-sources/Manifest: 252 bytes Delete sys-kernel/ck-sources/ck-sources-2.6.12_p5-r1.ebuild Fetching sys-kernel/ck-sources/ck-sources-2.6.12_p6.ebuild v 1.1 New sys-kernel/ck-sources/ck-sources-2.6.12_p6.ebuild: 1438 bytes Delete sys-kernel/ck-sources/files/digest-ck-sources-2.6.12_p5-r1 Fetching sys-kernel/ck-sources/files/digest-ck-sources-2.6.12_p6 v 1.1 New sys-kernel/ck-sources/files/digest-ck-sources-2.6.12_p6: 279 bytes Can't fork at /usr/bin/git-cvsimport line 592, <CVS> line 3810053. cat: write error: Broken pipe > It had done 285070 of 343822 patchsets. > > Have you dropped the -a from the git-repack invocation? That should > help. Try also Linus' patch for git-rev-list. The other thing hurting > us is that the commits are _huge_. I wonder how you guys were managing > this with CVS. Now _this_ explains why cvsimport grows humongous. I wasn't running with a version that did repacks; I just suspended the cvsimport a couple of times and ran a repack manually. > I'll try to rework the commit loop so that we don't need to hold all > the filenames in memory. It seems to be choking with the commits after > April 2004. But that will have to wait till tonight. Thanks, Donnie [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-29 22:32 ` Donnie Berkholz @ 2006-05-30 0:19 ` Martin Langhoff 2006-05-30 5:31 ` Donnie Berkholz 2006-05-30 0:43 ` Linus Torvalds 2006-05-30 22:31 ` Martin Langhoff 2 siblings, 1 reply; 83+ messages in thread From: Martin Langhoff @ 2006-05-30 0:19 UTC (permalink / raw) To: Donnie Berkholz Cc: Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On 5/30/06, Donnie Berkholz <spyderous@gentoo.org> wrote: > Looking closer, I see that the memory suckers do appear to be git, from > dmesg: > > Out of Memory: Kill process 17230 (git-repack) score 97207 and children. > Out of memory: Killed process 17231 (git-rev-list). That would mean that you do have Linus' patch then. Grep cvsimport for repack and remove the -a -- and consider using his recent patch to rev-list. My dmesg talks about an earlier cvs segfault. Nasty tree you have here -- it's breaking all sorts of things... and teaching us a thing or two about the import process. > Committed patch 249100 (origin 2005-08-20 05:05:58) Hmmm? How can you be at patch 249100 and still be a good year ahead of me? Have you told cvsps to cut off old history? Another thing I found is that this import uses a lot of $TMPDIR, so if your TMPDIR is small, you'll hit all sorts of problems. cheers, martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-30 0:19 ` Martin Langhoff @ 2006-05-30 5:31 ` Donnie Berkholz 2006-05-30 6:01 ` Martin Langhoff 0 siblings, 1 reply; 83+ messages in thread From: Donnie Berkholz @ 2006-05-30 5:31 UTC (permalink / raw) To: Martin Langhoff Cc: Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin [-- Attachment #1: Type: text/plain, Size: 1450 bytes --] Martin Langhoff wrote: > On 5/30/06, Donnie Berkholz <spyderous@gentoo.org> wrote: >> Looking closer, I see that the memory suckers do appear to be git, from >> dmesg: >> >> Out of Memory: Kill process 17230 (git-repack) score 97207 and children. >> Out of memory: Killed process 17231 (git-rev-list). > > That would mean that you do have Linus' patch then. Grep cvsimport for > repack and remove the -a -- and consider using his recent patch to > rev-list. You certainly would think so, and I did as well, but available evidence indicates otherwise. I'm not sure how the repack got in there. donnie@supernova ~ $ type git-cvsimport git-cvsimport is /usr/bin/git-cvsimport donnie@supernova ~ $ grep repack /usr/bin/git-cvsimport donnie@supernova ~ $ All I can think of is that I somehow OOM'd when I manually ran a repack and didn't notice it. But that should've at least made me unable to resume the cvsimport process, which happily kept chugging along later on. > My dmesg talks about an earlier cvs segfault. Nasty tree you have here > -- it's breaking all sorts of things... and teaching us a thing or two > about the import process. > >> Committed patch 249100 (origin 2005-08-20 05:05:58) > > Hmmm? How can you be at patch 249100 and still be a good year ahead of > me? Have you told cvsps to cut off old history? Nope. I ran the exact cvsps flags you posted earlier to create it. Thanks, Donnie [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-30 5:31 ` Donnie Berkholz @ 2006-05-30 6:01 ` Martin Langhoff 0 siblings, 0 replies; 83+ messages in thread From: Martin Langhoff @ 2006-05-30 6:01 UTC (permalink / raw) To: Donnie Berkholz Cc: Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On 5/30/06, Donnie Berkholz <spyderous@gentoo.org> wrote: > All I can think of is that I somehow OOM'd when I manually ran a repack > and didn't notice it. But that should've at least made me unable to > resume the cvsimport process, which happily kept chugging along later on. Sounds likely -- and cvsimport restarts gracefully, though you might want to do git checkout HEAD to get a usable checkout if the very first import failed. However, the default head is master, and what you want to look at is origin or whatever you passed as your -o parameter. I use cvshead normally, so I do git log cvshead > > My dmesg talks about an earlier cvs segfault. Nasty tree you have here > > -- it's breaking all sorts of things... and teaching us a thing or two > > about the import process. > > > >> Committed patch 249100 (origin 2005-08-20 05:05:58) > > > > Hmmm? How can you be at patch 249100 and still be a good year ahead of > > me? Have you told cvsps to cut off old history? > > Nope. I ran the exact cvsps flags you posted earlier to create it. Oh, that was an earlier PEBKAK at my end: I did git log HEAD instead of git log cvshead. My import is now at 293145 (cvshead +0000 2005-12-25 12:24:42) which looks promising. cheers, martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-29 22:32 ` Donnie Berkholz 2006-05-30 0:19 ` Martin Langhoff @ 2006-05-30 0:43 ` Linus Torvalds 2006-05-30 22:31 ` Martin Langhoff 2 siblings, 0 replies; 83+ messages in thread From: Linus Torvalds @ 2006-05-30 0:43 UTC (permalink / raw) To: Donnie Berkholz Cc: Martin Langhoff, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On Mon, 29 May 2006, Donnie Berkholz wrote: > > Looking closer, I see that the memory suckers do appear to be git, from > dmesg: > > Out of Memory: Kill process 17230 (git-repack) score 97207 and children. > Out of memory: Killed process 17231 (git-rev-list). Sounds like you had the "git repack -a -d" thing in your cvsimport. The current git rev-list should use only about a third of the memory of the one you used, so hopefully you could just update your git version, and then continue with the "git cvsimport" without having to start all over. Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-29 22:32 ` Donnie Berkholz 2006-05-30 0:19 ` Martin Langhoff 2006-05-30 0:43 ` Linus Torvalds @ 2006-05-30 22:31 ` Martin Langhoff 2006-05-30 23:07 ` Linus Torvalds 2 siblings, 1 reply; 83+ messages in thread From: Martin Langhoff @ 2006-05-30 22:31 UTC (permalink / raw) To: Donnie Berkholz Cc: Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On 5/30/06, Donnie Berkholz <spyderous@gentoo.org> wrote: > Martin Langhoff wrote: > > On 5/30/06, Donnie Berkholz <spyderous@gentoo.org> wrote: > >> Finally hit an OOM sometime in the past day (yep, a week later) =\. Not > >> sure whether it was cvsimport or cvs. Anyone else had more luck? With the latest cvsimport in Junio's repo, a lot of RAM and a bit of patience... gitview http://git.catalyst.net.nz/gitweb?p=gentoo.git;a=summary fetchable http://git.catalyst.net.nz/git/gentoo.git#cvshead Still pushing it, will be there in a minute or so. The packed repo weights about 660MB. Not too bad given the size of the project and the number of commits. martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-30 22:31 ` Martin Langhoff @ 2006-05-30 23:07 ` Linus Torvalds 2006-05-31 1:04 ` Martin Langhoff 0 siblings, 1 reply; 83+ messages in thread From: Linus Torvalds @ 2006-05-30 23:07 UTC (permalink / raw) To: Martin Langhoff Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On Wed, 31 May 2006, Martin Langhoff wrote: > > gitview > http://git.catalyst.net.nz/gitweb?p=gentoo.git;a=summary Heh. I think you should enable caching in your apache config. And maybe we should make that part of the gitweb docs. Without a caching web-server, gitweb is pretty slow, but it caches _beautifully_. That gentoo repo has a lot of "duplicate" commits that cvsps will mark as two separate commits because there's one commit for the files, and one commit for whatever the "Manifest" file is. I wonder if those commits should generally be merged or something. That said, things like that are most easily fixed as a git->git update (along with adding name translation), which can avoid re-writing the trees. Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-30 23:07 ` Linus Torvalds @ 2006-05-31 1:04 ` Martin Langhoff 2006-05-31 2:49 ` Donnie Berkholz 0 siblings, 1 reply; 83+ messages in thread From: Martin Langhoff @ 2006-05-31 1:04 UTC (permalink / raw) To: Linus Torvalds Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On 5/31/06, Linus Torvalds <torvalds@osdl.org> wrote: > On Wed, 31 May 2006, Martin Langhoff wrote: > > > > gitview > > http://git.catalyst.net.nz/gitweb?p=gentoo.git;a=summary > > Heh. I think you should enable caching in your apache config. I know I should -- but I'm hoping to find the time to rework gitweb a bit to actually work fast instead. It bothers me that it is so slow on a basically idle machine, and where I can perform the corresponding git operations in the commandline in a blink. And caching is great for really busy sites (aka kernel.org) but git.catalyst.net.nz only serves a handful of small repos for a small group of people, and is 99% idle. Should blaze through this stuff. > That gentoo repo has a lot of "duplicate" commits that cvsps will mark as > two separate commits because there's one commit for the files, and one > commit for whatever the "Manifest" file is. I wonder if those commits > should generally be merged or something. > > That said, things like that are most easily fixed as a git->git update > (along with adding name translation), which can avoid re-writing the > trees. Yep, large projects often have good reasons to run custom imports, merging certain commits, rewriting log messages (like the X.org guys were doing). It can be done at the cvsimport stage or later -- I think Pasky has a rewritehistory tool hidden somewhere in Cogito, but I haven't used it. cheers, martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-31 1:04 ` Martin Langhoff @ 2006-05-31 2:49 ` Donnie Berkholz 2006-05-31 6:05 ` Martin Langhoff 0 siblings, 1 reply; 83+ messages in thread From: Donnie Berkholz @ 2006-05-31 2:49 UTC (permalink / raw) To: Martin Langhoff Cc: Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin, Alec Warner [-- Attachment #1: Type: text/plain, Size: 969 bytes --] Martin Langhoff wrote: > On 5/31/06, Linus Torvalds <torvalds@osdl.org> wrote: >> That gentoo repo has a lot of "duplicate" commits that cvsps will mark as >> two separate commits because there's one commit for the files, and one >> commit for whatever the "Manifest" file is. I wonder if those commits >> should generally be merged or something. >> >> That said, things like that are most easily fixed as a git->git update >> (along with adding name translation), which can avoid re-writing the >> trees. > > Yep, large projects often have good reasons to run custom imports, > merging certain commits, rewriting log messages (like the X.org guys > were doing). It can be done at the cvsimport stage or later -- I think > Pasky has a rewritehistory tool hidden somewhere in Cogito, but I > haven't used it. We've got a guy who got a Summer of Code project to work on CVS migration, so this could be something along his lines. Thanks, Donnie [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-31 2:49 ` Donnie Berkholz @ 2006-05-31 6:05 ` Martin Langhoff 2006-05-31 13:54 ` Alec Warner 0 siblings, 1 reply; 83+ messages in thread From: Martin Langhoff @ 2006-05-31 6:05 UTC (permalink / raw) To: Donnie Berkholz Cc: Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin, Alec Warner On 5/31/06, Donnie Berkholz <spyderous@gentoo.org> wrote: > We've got a guy who got a Summer of Code project to work on CVS > migration, so this could be something along his lines. He'll want a fast box to wrangle with this repo ;-) martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-31 6:05 ` Martin Langhoff @ 2006-05-31 13:54 ` Alec Warner 2006-05-31 22:03 ` Martin Langhoff 0 siblings, 1 reply; 83+ messages in thread From: Alec Warner @ 2006-05-31 13:54 UTC (permalink / raw) To: Martin Langhoff Cc: Donnie Berkholz, Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin Martin Langhoff wrote: > On 5/31/06, Donnie Berkholz <spyderous@gentoo.org> wrote: >> We've got a guy who got a Summer of Code project to work on CVS >> migration, so this could be something along his lines. > > He'll want a fast box to wrangle with this repo ;-) > > > martin I have a dual opteron with 4gb of ram "on loan" from work :) It still dies though, using git cvsimport or parsecvs. I talked to Keith Packard about adding support to parsecvs for recording the actual changed changesets, but I haven't yet started on implementing that since he isn't using cvsps in parsecvs. I also haven't had a chance to look at the git-cvsimport sources yet, was hoping to get to that later this week. ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-31 13:54 ` Alec Warner @ 2006-05-31 22:03 ` Martin Langhoff 2006-06-01 1:42 ` Alec Warner 0 siblings, 1 reply; 83+ messages in thread From: Martin Langhoff @ 2006-05-31 22:03 UTC (permalink / raw) To: antarus Cc: Donnie Berkholz, Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On 6/1/06, Alec Warner <antarus@gentoo.org> wrote: > I have a dual opteron with 4gb of ram "on loan" from work :) > > It still dies though, using git cvsimport or parsecvs. The machine I am running this is more constrained than that, and it doesn't die. It just takes maybe 30hs. Make sure it's not a bad cvs binary you got there (latest from gentoo seems to leak memory). And if it's still dying... give us some more details ;-) cheers, martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-31 22:03 ` Martin Langhoff @ 2006-06-01 1:42 ` Alec Warner 2006-06-01 7:47 ` Martin Langhoff 0 siblings, 1 reply; 83+ messages in thread From: Alec Warner @ 2006-06-01 1:42 UTC (permalink / raw) To: Martin Langhoff Cc: Donnie Berkholz, Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin Martin Langhoff wrote: > On 6/1/06, Alec Warner <antarus@gentoo.org> wrote: > >> I have a dual opteron with 4gb of ram "on loan" from work :) >> >> It still dies though, using git cvsimport or parsecvs. > > > The machine I am running this is more constrained than that, and it > doesn't die. It just takes maybe 30hs. Make sure it's not a bad cvs > binary you got there (latest from gentoo seems to leak memory). > > And if it's still dying... give us some more details ;-) > > cheers, > > > martin After reading the whole thread on this, I've using a git checkout of git, cvsps-2.1 and cvs-1.11.12, running overnight in verbose mode with screen. Hopefully will have a repo in the morning ;) ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-06-01 1:42 ` Alec Warner @ 2006-06-01 7:47 ` Martin Langhoff 2006-06-05 0:33 ` Alec Warner 0 siblings, 1 reply; 83+ messages in thread From: Martin Langhoff @ 2006-06-01 7:47 UTC (permalink / raw) To: antarus Cc: Donnie Berkholz, Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On 6/1/06, Alec Warner <antarus@gentoo.org> wrote: > After reading the whole thread on this, I've using a git checkout of > git, cvsps-2.1 and cvs-1.11.12, running overnight in verbose mode with > screen. Hopefully will have a repo in the morning ;) Good stuff. I am rerunning it to prove (and bench) a complete an uninterrupted import. So far it's done 4hs 30m, footprint grown to 207MB, 49750 commits. So I think it will be done in approx 30hs on this single-cpu opteron. Most commits are small, but there is a handful that are downright massive -- and we hold all the file list in memory, which I think explains (most of) the memory growth. I've looked into avoiding holding the whole filelist in memory, but it involves rewriting the cvsps output parsing loop, which is better left for a rainy day, with a test case that doesn't take 30hs to resolve. cheers, martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-06-01 7:47 ` Martin Langhoff @ 2006-06-05 0:33 ` Alec Warner 2006-06-05 2:06 ` Martin Langhoff 0 siblings, 1 reply; 83+ messages in thread From: Alec Warner @ 2006-06-05 0:33 UTC (permalink / raw) To: Martin Langhoff Cc: Donnie Berkholz, Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin Martin Langhoff wrote: > On 6/1/06, Alec Warner <antarus@gentoo.org> wrote: > >> After reading the whole thread on this, I've using a git checkout of >> git, cvsps-2.1 and cvs-1.11.12, running overnight in verbose mode with >> screen. Hopefully will have a repo in the morning ;) > > > Good stuff. I am rerunning it to prove (and bench) a complete an > uninterrupted import. So far it's done 4hs 30m, footprint grown to > 207MB, 49750 commits. So I think it will be done in approx 30hs on > this single-cpu opteron. > > Most commits are small, but there is a handful that are downright > massive -- and we hold all the file list in memory, which I think > explains (most of) the memory growth. I've looked into avoiding > holding the whole filelist in memory, but it involves rewriting the > cvsps output parsing loop, which is better left for a rainy day, with > a test case that doesn't take 30hs to resolve. Ok the box this was running on had issues, so I switched to using pearl.amd64.dev.gentoo.org, a dual core amd64 X2 4600+ with 4 gigs of ram and plenty of disk. The "problem" now is just converstion time...30 hours and I'm into 2004-09-17...but it's been in 2004 all day, seems like most of the commits are in the last three years. Are there architectural issues with doing this in parallel? Since the repository commits are all in cvs, it should be possible to do the work in parallel, since you know what all the commits touch. The concern would be ordering of nodes in the tree; you'd end up building a bunch of subtrees and patching them together? -Alec Warner ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-06-05 0:33 ` Alec Warner @ 2006-06-05 2:06 ` Martin Langhoff 2006-06-05 2:36 ` Alec Warner 0 siblings, 1 reply; 83+ messages in thread From: Martin Langhoff @ 2006-06-05 2:06 UTC (permalink / raw) To: antarus Cc: Donnie Berkholz, Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On 6/5/06, Alec Warner <antarus@gentoo.org> wrote: > Ok the box this was running on had issues, so I switched to using > pearl.amd64.dev.gentoo.org, a dual core amd64 X2 4600+ with 4 gigs of > ram and plenty of disk. The "problem" now is just converstion time...30 > hours and I'm into 2004-09-17...but it's been in 2004 all day, seems > like most of the commits are in the last three years. Are there > architectural issues with doing this in parallel? I don't think you can do this in parallel. What I would do is remove the -a from the git-repack invocation. It does hurt import times quite a bit -- just do a git-repack -a -d when it's done. And... having said that, there is still a memory leak somehow, somewhere. It's been evading me for 2 weeks now, so I feel an idiot now. Not too bad in general, but it shows clearly in the gentoo and mozilla imports. > Since the repository commits are all in cvs, it should be possible to do > the work in parallel, since you know what all the commits touch. The > concern would be ordering of nodes in the tree; you'd end up building a > bunch of subtrees and patching them together? Well... parsecvs does a bit of this but in sequential fashion... it imports all the files first, and then runs through the history building the tree+commits in order, committing them. It saves a lot of time in the file imports by parsing the RCS file directly. The downside is that it must keep a filename+version=>sha1 mapping -- which I think is why parsecvs won't fit in memory until it's changed to store it on disk somehow ;-) You are forced to do it in a sequence because cvsps only tells you about the files added/removed/changed in a commit -- you need the ancestor to have a view of what the whole tree looked like. The only room for parallelism I see is to fork off new processes to work on branches in parallel. martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-06-05 2:06 ` Martin Langhoff @ 2006-06-05 2:36 ` Alec Warner 2006-06-05 3:49 ` Martin Langhoff [not found] ` <20060605120743.566fb85f.seanlkml@sympatico.ca> 0 siblings, 2 replies; 83+ messages in thread From: Alec Warner @ 2006-06-05 2:36 UTC (permalink / raw) To: Martin Langhoff Cc: Donnie Berkholz, Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin Martin Langhoff wrote: > On 6/5/06, Alec Warner <antarus@gentoo.org> wrote: > >> Ok the box this was running on had issues, so I switched to using >> pearl.amd64.dev.gentoo.org, a dual core amd64 X2 4600+ with 4 gigs of >> ram and plenty of disk. The "problem" now is just converstion time...30 >> hours and I'm into 2004-09-17...but it's been in 2004 all day, seems >> like most of the commits are in the last three years. Are there >> architectural issues with doing this in parallel? > > > I don't think you can do this in parallel. What I would do is remove > the -a from the git-repack invocation. It does hurt import times quite > a bit -- just do a git-repack -a -d when it's done. Only repack at the end then? disk space isn't an issue here so I'll give that a shot. > > And... having said that, there is still a memory leak somehow, > somewhere. It's been evading me for 2 weeks now, so I feel an idiot > now. Not too bad in general, but it shows clearly in the gentoo and > mozilla imports. 30565 antarus 17 0 470m 456m 1640 S 14 11.6 234:23.38 git-cvsimport 30566 antarus 16 0 6753m 147m 752 S 7 3.7 120:27.06 cvs I'm on cvs-1.11.12 and the git version of git > You are forced to do it in a sequence because cvsps only tells you > about the files added/removed/changed in a commit -- you need the > ancestor to have a view of what the whole tree looked like. The only > room for parallelism I see is to fork off new processes to work on > branches in parallel. Not helpful in the Gentoo case, since we only have one branch; minus an accident when a dev branched gentoo-x86 a while back ;) I'll keep chugging on this one; it won't be the final import as I haven't used the complete Authors file, so I will try the repacking optimization next time I do an import. -Alec Warner ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-06-05 2:36 ` Alec Warner @ 2006-06-05 3:49 ` Martin Langhoff [not found] ` <20060605120743.566fb85f.seanlkml@sympatico.ca> 1 sibling, 0 replies; 83+ messages in thread From: Martin Langhoff @ 2006-06-05 3:49 UTC (permalink / raw) To: antarus Cc: Donnie Berkholz, Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On 6/5/06, Alec Warner <antarus@gentoo.org> wrote: > > I don't think you can do this in parallel. What I would do is remove > > the -a from the git-repack invocation. It does hurt import times quite > > a bit -- just do a git-repack -a -d when it's done. > > Only repack at the end then? disk space isn't an issue here so I'll give > that a shot. Not exactly -- by removing the -a from the git-repack invocation what you get is cheap "partial" packing rather than a full repack. This is somewhat inefficient disk-wise, perhaps by 10% or so. But full repacks get more and more expensive as the repo grows. So you don't need to run git-repack -a -d at the end, but it will be a good measure to see how compact the packing gets. > > And... having said that, there is still a memory leak somehow, > > somewhere. It's been evading me for 2 weeks now, so I feel an idiot > > now. Not too bad in general, but it shows clearly in the gentoo and > > mozilla imports. > > 30565 antarus 17 0 470m 456m 1640 S 14 11.6 234:23.38 > git-cvsimport > 30566 antarus 16 0 6753m 147m 752 S 7 3.7 120:27.06 cvs > > I'm on cvs-1.11.12 and the git version of git Yep, I see roughly the same. It grows slowly and I don't know why :( > I'll keep chugging on this one; it won't be the final import as I > haven't used the complete Authors file, so I will try the repacking > optimization next time I do an import. Cool. If it dies for any reason, just do git-update-ref refs/heads/master refs/heads/origin git-update-ref HEAD origin git-checkout You only need to do this the first time -- after that, the core heads are set. Rerun the script and it will pick up where it left. If it dies again, just do git-checkout to see the latest files. (Above, replace origin with your -o option if you are using it. I normally use -o cvshead.) martin ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <20060605120743.566fb85f.seanlkml@sympatico.ca>]
* Re: irc usage.. [not found] ` <20060605120743.566fb85f.seanlkml@sympatico.ca> @ 2006-06-05 16:07 ` Sean 0 siblings, 0 replies; 83+ messages in thread From: Sean @ 2006-06-05 16:07 UTC (permalink / raw) To: antarus Cc: martin.langhoff, spyderous, torvalds, ydirson, git, smurf, Johannes.Schindelin On Sun, 04 Jun 2006 22:36:44 -0400 Alec Warner <antarus@gentoo.org> wrote: > I'll keep chugging on this one; it won't be the final import as I > haven't used the complete Authors file, so I will try the repacking > optimization next time I do an import. Hi Alec, You may want to go back and do another import for other reasons, but if the only reason is to fix up the author information it would be _much_ faster to simply rewrite the git commit history. Cogito has something called "cg-admin-rewritehist" which should do what you need and there are other scripts floating around specificially for rewriting just the author information. HTH, Sean ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 19:09 ` Donnie Berkholz 2006-05-22 19:38 ` Linus Torvalds @ 2006-05-22 19:41 ` Martin Langhoff 2006-05-22 20:11 ` Linus Torvalds 2006-05-22 20:16 ` irc usage Donnie Berkholz 2 siblings, 1 reply; 83+ messages in thread From: Martin Langhoff @ 2006-05-22 19:41 UTC (permalink / raw) To: Donnie Berkholz Cc: Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On 5/23/06, Donnie Berkholz <spyderous@gentoo.org> wrote: > So it seems the problem is in cvs itself. I will try another run with -L > now. What version of cvs are you using? Perhaps trying a different one? The dev machine where I am running the import is a slug! It's still working on it, only gotten to 7700 commits, with the cvsimport process stable at 28MB RAM and cvs stable at 4MB. cheers, martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 19:41 ` Martin Langhoff @ 2006-05-22 20:11 ` Linus Torvalds 2006-05-22 20:33 ` Linus Torvalds 2006-05-22 21:41 ` Matthias Urlichs 0 siblings, 2 replies; 83+ messages in thread From: Linus Torvalds @ 2006-05-22 20:11 UTC (permalink / raw) To: Martin Langhoff Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On Tue, 23 May 2006, Martin Langhoff wrote: > > The dev machine where I am running the import is a slug! It's still > working on it, only gotten to 7700 commits, with the cvsimport process > stable at 28MB RAM and cvs stable at 4MB. I have to say, that cvsimport script really does do horrible things. It's basically a fork/exec/exit benchmark, as far as I can tell. Running oprofile on the thing, the top offenders are (ignore the 45% idle thing: it's just because this was run on a dual-cpu system, so since it's almost completely single-threaded you get ~50% idle by default). 3117654 45.8708 vmlinux vmlinux .power4_idle 802313 11.8046 vmlinux vmlinux .unmap_vmas 632913 9.3122 vmlinux vmlinux .copy_page_range 150359 2.2123 vmlinux vmlinux .release_pages 131330 1.9323 vmlinux vmlinux .vm_normal_page 117836 1.7337 libperl.so libperl.so (no symbols) 74098 1.0902 libgklayout.so libgklayout.so (no symbols) 54680 0.8045 vmlinux vmlinux .free_pages_and_swap_cache 54300 0.7989 libfb.so libfb.so (no symbols) 49052 0.7217 vmlinux vmlinux .copy_4K_page 46559 0.6850 libc-2.4.so libc-2.4.so getc 42677 0.6279 vmlinux vmlinux .page_remove_rmap 41133 0.6052 libc-2.4.so libc-2.4.so ferror .. those kernel functions are all about process create/exit, and COW faulting after the fork. Now, this is on ppc, so process creation is likely slower (idiotic PPC VM page table hashes), but Linux is actually very good at doing this, and the fact that process create/exit is so high is a very big sign that the script just ends up executing a _ton_ of small simple processes that do almost nothing. I wonder why those "git-update-index" calls seem to be (assuming I read the perl correctly) done only a few files at a time. We can do a hundreds in one go, but it seems to want to do just ten files or something at the same time. Although since most commits should hopefully just modify a couple of files, that probably isn't a big deal. That thing would probably be an order of magnitude faster if written to use the git library interfaces directly. Of course, the CVS part is probably a big overhead, so it might not help much (I would not be surprised at all if a number of the fork/exec/exit things are due to the CVS server starting RCS or something, not due to git-cvsimport itself) Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 20:11 ` Linus Torvalds @ 2006-05-22 20:33 ` Linus Torvalds 2006-05-22 21:41 ` Matthias Urlichs 1 sibling, 0 replies; 83+ messages in thread From: Linus Torvalds @ 2006-05-22 20:33 UTC (permalink / raw) To: Martin Langhoff Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin On Mon, 22 May 2006, Linus Torvalds wrote: > > Of course, the CVS part is probably a big overhead, so it might not help > much (I would not be surprised at all if a number of the fork/exec/exit > things are due to the CVS server starting RCS or something, not due to > git-cvsimport itself) Ahh. stracing the CVS server seems to imply that it forks off a subprocess for every command. It doesn't actually execute any external program, but just does a fork + muck around in the ,v files + exit. Maybe one of the changes in the 1.12.x versions is to not do that, which might explain why Donnie seems to see much better performance, but also sees all the memory leakage? Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 20:11 ` Linus Torvalds 2006-05-22 20:33 ` Linus Torvalds @ 2006-05-22 21:41 ` Matthias Urlichs 2006-05-22 22:18 ` Linus Torvalds 2006-05-22 22:39 ` Junio C Hamano 1 sibling, 2 replies; 83+ messages in thread From: Matthias Urlichs @ 2006-05-22 21:41 UTC (permalink / raw) To: Linus Torvalds Cc: Martin Langhoff, Donnie Berkholz, Yann Dirson, Git Mailing List, Johannes Schindelin [-- Attachment #1: Type: text/plain, Size: 872 bytes --] Hi, Linus Torvalds: > I wonder why those "git-update-index" calls seem to be (assuming I read > the perl correctly) done only a few files at a time. We can do a hundreds > in one go, but it seems to want to do just ten files or something at the > same time. No, fifty. I simply was too lazy to count the actual filenames' lengths. ;-) > That thing would probably be an order of magnitude faster if written to > use the git library interfaces directly. Of course, the CVS part is > probably a big overhead, so it might not help much The beast *was* mainly written to do this remotely... -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de - - The worst form of inequality is to try to make unequal things equal. -- Aristotle [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 191 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 21:41 ` Matthias Urlichs @ 2006-05-22 22:18 ` Linus Torvalds 2006-05-22 23:23 ` Martin Langhoff 2006-05-22 22:39 ` Junio C Hamano 1 sibling, 1 reply; 83+ messages in thread From: Linus Torvalds @ 2006-05-22 22:18 UTC (permalink / raw) To: Matthias Urlichs Cc: Martin Langhoff, Donnie Berkholz, Yann Dirson, Git Mailing List, Johannes Schindelin On Mon, 22 May 2006, Matthias Urlichs wrote: > > The beast *was* mainly written to do this remotely... I don't think the remote usability is valid, except for some really small repositories. The fact that it takes hours even when the CVS server is local doesn't bode well for doing it remotely for any but the most trivial things. I really think it would be better to have local use be the optimized case, with remote being the "it's _possible_" case. Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 22:18 ` Linus Torvalds @ 2006-05-22 23:23 ` Martin Langhoff 2006-05-22 23:29 ` Martin Langhoff 2006-05-22 23:33 ` Linus Torvalds 0 siblings, 2 replies; 83+ messages in thread From: Martin Langhoff @ 2006-05-22 23:23 UTC (permalink / raw) To: Linus Torvalds Cc: Matthias Urlichs, Donnie Berkholz, Yann Dirson, Git Mailing List, Johannes Schindelin On 5/23/06, Linus Torvalds <torvalds@osdl.org> wrote: > I don't think the remote usability is valid, except for some really small > repositories. The fact that it takes hours even when the CVS server is > local doesn't bode well for doing it remotely for any but the most trivial > things. I really don't think that using the local cvs binary is a problem at all. In my experience, the thing is fairly fast and optimized when you ask it to perform file-oriented questions and that's all we do, really. If you want to try it, you'll see that local checkouts of large trees (like this gentoo one) are fairly fast. Not as fast as GIT itself, but good enough. I think Donnie has hit a bug with a bad version of cvs, but other than that, my experience with it is that it is fairly well behaved -- even if the tool is bad, ubiquity has lead to resiliency over the years. > I really think it would be better to have local use be the optimized case, > with remote being the "it's _possible_" case. Agreed, but I think we won't see much benefit in direct parsing. And we'll have to take the hit of double-implementation. In any case, we have it already -- parsecvs does it quite well (modulo memory leaks!) and I've used it several times in conjunction with cvsimport. Just perform the initial import with parsecvs and then 'track' the remote project with cvsimport. The problem is that they lead to slightly different trees. So their output is not consistent, and I don't think that'll be easy to fix. cheers, martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 23:23 ` Martin Langhoff @ 2006-05-22 23:29 ` Martin Langhoff 2006-05-22 23:33 ` Linus Torvalds 1 sibling, 0 replies; 83+ messages in thread From: Martin Langhoff @ 2006-05-22 23:29 UTC (permalink / raw) To: Linus Torvalds Cc: Matthias Urlichs, Donnie Berkholz, Yann Dirson, Git Mailing List, Johannes Schindelin On 5/23/06, Martin Langhoff <martin.langhoff@gmail.com> wrote: > The problem is that they lead to slightly different trees. Sorry! s/trees/histories/ there. The trees are (or should!) be the same, and tree differences should be addressed as bugs. Differences in how history is parsed are unavoidable right now. martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 23:23 ` Martin Langhoff 2006-05-22 23:29 ` Martin Langhoff @ 2006-05-22 23:33 ` Linus Torvalds 1 sibling, 0 replies; 83+ messages in thread From: Linus Torvalds @ 2006-05-22 23:33 UTC (permalink / raw) To: Martin Langhoff Cc: Matthias Urlichs, Donnie Berkholz, Yann Dirson, Git Mailing List, Johannes Schindelin On Tue, 23 May 2006, Martin Langhoff wrote: > > I really don't think that using the local cvs binary is a problem at > all. In my experience, the thing is fairly fast and optimized when you > ask it to perform file-oriented questions and that's all we do, > really. Fair enough. My worry was mainly that the cvs server was doing something stupid, but I suspect most of the fork/exec's are probably from the cvsimport perl script itself. > In any case, we have it already -- parsecvs does it quite well (modulo > memory leaks!) and I've used it several times in conjunction with > cvsimport. Just perform the initial import with parsecvs and then > 'track' the remote project with cvsimport. I didn't get parsecvs working when I tried it a long time ago, and Donnie reported that it ran out of memory, so I didn't even really consider it. I'd love for it to work well, and it may be reasonable to do really big imports on multi-gigabyte 64-bit machines (after all, they aren't _hard_ to find any more, and you only need to do it once). That said, it still seems pretty stupid to require that much memory just to import from CVS. Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 21:41 ` Matthias Urlichs 2006-05-22 22:18 ` Linus Torvalds @ 2006-05-22 22:39 ` Junio C Hamano 2006-05-22 23:15 ` Martin Langhoff 1 sibling, 1 reply; 83+ messages in thread From: Junio C Hamano @ 2006-05-22 22:39 UTC (permalink / raw) To: Matthias Urlichs; +Cc: git Matthias Urlichs <smurf@smurf.noris.de> writes: > Hi, > > Linus Torvalds: >> I wonder why those "git-update-index" calls seem to be (assuming I read >> the perl correctly) done only a few files at a time. We can do a hundreds >> in one go, but it seems to want to do just ten files or something at the >> same time. > > No, fifty. > > I simply was too lazy to count the actual filenames' lengths. ;-) I think cvsimport predates that option, but these days that loop can be optimized by feeding --index-info from standard input. ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 22:39 ` Junio C Hamano @ 2006-05-22 23:15 ` Martin Langhoff 2006-05-23 6:52 ` Jeff King 0 siblings, 1 reply; 83+ messages in thread From: Martin Langhoff @ 2006-05-22 23:15 UTC (permalink / raw) To: Junio C Hamano; +Cc: Matthias Urlichs, git On 5/23/06, Junio C Hamano <junkio@cox.net> wrote: > > I simply was too lazy to count the actual filenames' lengths. ;-) > > I think cvsimport predates that option, but these days that loop > can be optimized by feeding --index-info from standard input. Oh, yep, that'd be a good addition. I think we can also cut down on the number of fork+exec calls (as Linus points out they are killing us) by caching some data we should already have that we are repeatedly asking from git-ref-parse. Other TODOs from my reading of the code last night... - Switch from line-oriented reads to block reads when fetching files from CVS. This gentoo has repo has some large binary blobs in it and we end up slurping them into memory. - Stop abusing globals in commit() -- pass the commit data as parameters. - Further profiling? Whatever we are doing, we aren't doing it fast :( Will be trying to do those things in the next few days, don't mind if someone jumps in as well. martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 23:15 ` Martin Langhoff @ 2006-05-23 6:52 ` Jeff King 2006-05-23 6:58 ` Jeff King 2006-05-23 7:00 ` [PATCH 2/2] cvsimport: cleanup commit function Jeff King 0 siblings, 2 replies; 83+ messages in thread From: Jeff King @ 2006-05-23 6:52 UTC (permalink / raw) To: Martin Langhoff; +Cc: Junio C Hamano, Matthias Urlichs, git On Tue, May 23, 2006 at 11:15:07AM +1200, Martin Langhoff wrote: > >I think cvsimport predates that option, but these days that loop > >can be optimized by feeding --index-info from standard input. > Oh, yep, that'd be a good addition. I think we can also cut down on This patch is relatively simple, and I'll post it in a moment. I also made a few other cleanups to commit() which apply on top of that; I'll post it also. > - Stop abusing globals in commit() -- pass the commit data as parameters. Some of the globals actually get modified in commit() (e.g., @old and @new get cleared). So we need to either pass them in as references or remember to do that cleanup each time it is called (which is really only twice, I think). > Will be trying to do those things in the next few days, don't mind if > someone jumps in as well. I can look at the line/block CVS file slurping, but not tonight. -Peff ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-23 6:52 ` Jeff King @ 2006-05-23 6:58 ` Jeff King 2006-05-23 7:01 ` [PATCH 1/2] cvsimport: use git-update-index --index-info Jeff King 2006-05-23 7:00 ` [PATCH 2/2] cvsimport: cleanup commit function Jeff King 1 sibling, 1 reply; 83+ messages in thread From: Jeff King @ 2006-05-23 6:58 UTC (permalink / raw) To: Martin Langhoff, Junio C Hamano, Matthias Urlichs, git >From nobody Mon Sep 17 00:00:00 2001 From: Jeff King <peff@peff.net> Date: Tue, 23 May 2006 01:16:07 -0400 Subject: [PATCH 1/2] cvsimport: use git-update-index --index-info This should reduce the number of git-update-index forks required per commit. We now do adds/removes in one call, and we are no longer forced to deal with argv limitations. --- cb6452bbfda9c52ad8dbeaac6e3440ae61099a05 git-cvsimport.perl | 36 +++++++++++++----------------------- 1 files changed, 13 insertions(+), 23 deletions(-) cb6452bbfda9c52ad8dbeaac6e3440ae61099a05 diff --git a/git-cvsimport.perl b/git-cvsimport.perl index d257e66..4efb0a5 100755 --- a/git-cvsimport.perl +++ b/git-cvsimport.perl @@ -565,29 +565,19 @@ my($patchset,$date,$author_name,$author_ my(@old,@new,@skipped); sub commit { my $pid; - while(@old) { - my @o2; - if(@old > 55) { - @o2 = splice(@old,0,50); - } else { - @o2 = @old; - @old = (); - } - system("git-update-index","--force-remove","--",@o2); - die "Cannot remove files: $?\n" if $?; - } - while(@new) { - my @n2; - if(@new > 12) { - @n2 = splice(@new,0,10); - } else { - @n2 = @new; - @new = (); - } - system("git-update-index","--add", - (map { ('--cacheinfo', @$_) } @n2)); - die "Cannot add files: $?\n" if $?; - } + + open(my $fh, '|-', qw(git-update-index --index-info)) + or die "unable to open git-update-index: $!"; + print $fh + (map { "0 0000000000000000000000000000000000000000\t$_\n" } + @old), + (map { '100' . sprintf('%o', $_->[0]) . " $_->[1]\t$_->[2]\n" } + @new) + or die "unable to write to git-update-index: $!"; + close $fh + or die "unable to write to git-update-index: $!"; + $? and die "git-update-index reported error: $?"; + @old = @new = (); $pid = open(C,"-|"); die "Cannot fork: $!" unless defined $pid; -- 1.3.3.gcb64-dirty ^ permalink raw reply related [flat|nested] 83+ messages in thread
* [PATCH 1/2] cvsimport: use git-update-index --index-info 2006-05-23 6:58 ` Jeff King @ 2006-05-23 7:01 ` Jeff King 0 siblings, 0 replies; 83+ messages in thread From: Jeff King @ 2006-05-23 7:01 UTC (permalink / raw) To: Martin Langhoff, Junio C Hamano, Matthias Urlichs, git This should reduce the number of git-update-index forks required per commit. We now do adds/removes in one call, and we are no longer forced to deal with argv limitations. --- Oops, apparently using a mail reader is too challenging for me. Here's a repost with the headers correctly merged. cb6452bbfda9c52ad8dbeaac6e3440ae61099a05 git-cvsimport.perl | 36 +++++++++++++----------------------- 1 files changed, 13 insertions(+), 23 deletions(-) cb6452bbfda9c52ad8dbeaac6e3440ae61099a05 diff --git a/git-cvsimport.perl b/git-cvsimport.perl index d257e66..4efb0a5 100755 --- a/git-cvsimport.perl +++ b/git-cvsimport.perl @@ -565,29 +565,19 @@ my($patchset,$date,$author_name,$author_ my(@old,@new,@skipped); sub commit { my $pid; - while(@old) { - my @o2; - if(@old > 55) { - @o2 = splice(@old,0,50); - } else { - @o2 = @old; - @old = (); - } - system("git-update-index","--force-remove","--",@o2); - die "Cannot remove files: $?\n" if $?; - } - while(@new) { - my @n2; - if(@new > 12) { - @n2 = splice(@new,0,10); - } else { - @n2 = @new; - @new = (); - } - system("git-update-index","--add", - (map { ('--cacheinfo', @$_) } @n2)); - die "Cannot add files: $?\n" if $?; - } + + open(my $fh, '|-', qw(git-update-index --index-info)) + or die "unable to open git-update-index: $!"; + print $fh + (map { "0 0000000000000000000000000000000000000000\t$_\n" } + @old), + (map { '100' . sprintf('%o', $_->[0]) . " $_->[1]\t$_->[2]\n" } + @new) + or die "unable to write to git-update-index: $!"; + close $fh + or die "unable to write to git-update-index: $!"; + $? and die "git-update-index reported error: $?"; + @old = @new = (); $pid = open(C,"-|"); die "Cannot fork: $!" unless defined $pid; -- 1.3.3.gcb64-dirty ^ permalink raw reply related [flat|nested] 83+ messages in thread
* [PATCH 2/2] cvsimport: cleanup commit function 2006-05-23 6:52 ` Jeff King 2006-05-23 6:58 ` Jeff King @ 2006-05-23 7:00 ` Jeff King [not found] ` <7v4pzh6wtr.fsf@assigned-by-dhcp.cox.net> ` (3 more replies) 1 sibling, 4 replies; 83+ messages in thread From: Jeff King @ 2006-05-23 7:00 UTC (permalink / raw) To: Martin Langhoff, Junio C Hamano, Matthias Urlichs, git This change attempts to clean up the commit function to make it a bit easier to read (or at least the first half of it). It also improves robustness and performance. Specifically: - report get_headref errors on opening ref unless the error is ENOENT - use regex to check for sha1 instead of length - use lexically scoped filehandles which get cleaned up automagically - check for error on both 'print' and 'close' (since output is buffered) - avoid "fork, do some perl, then exec" in commit(). It's not necessary, and we probably end up COW'ing parts of the perl process. Plus the code is much smaller because we can use open2() - avoid calling strftime over and over (mainly a readability cleanup) --- I know this patch is quite large. I can try to split it if you want, but I suspect it's not worth the effort (either you like refactoring or you don't :) ). 9dc9f05ab5e1cbd8765238e7b1da0addd6f4296a git-cvsimport.perl | 150 ++++++++++++++++++++++------------------------------ 1 files changed, 64 insertions(+), 86 deletions(-) 9dc9f05ab5e1cbd8765238e7b1da0addd6f4296a diff --git a/git-cvsimport.perl b/git-cvsimport.perl index 4efb0a5..f8feb52 100755 --- a/git-cvsimport.perl +++ b/git-cvsimport.perl @@ -23,7 +23,7 @@ use File::Basename qw(basename dirname); use Time::Local; use IO::Socket; use IO::Pipe; -use POSIX qw(strftime dup2); +use POSIX qw(strftime dup2 :errno_h); use IPC::Open2; $SIG{'PIPE'}="IGNORE"; @@ -429,22 +429,25 @@ sub getwd() { return $pwd; } +sub is_sha1 { + my $s = shift; + return $s =~ /^[a-zA-Z0-9]{40}$/; +} -sub get_headref($$) { +sub get_headref ($$) { my $name = shift; my $git_dir = shift; - my $sha; - if (open(C,"$git_dir/refs/heads/$name")) { - chomp($sha = <C>); - close(C); - length($sha) == 40 - or die "Cannot get head id for $name ($sha): $!\n"; + my $f = "$git_dir/refs/heads/$name"; + if(open(my $fh, $f)) { + chomp(my $r = <$fh>); + is_sha1($r) or die "Cannot get head id for $name ($r): $!"; + return $r; } - return $sha; + die "unable to open $f: $!" unless $! == POSIX::ENOENT; + return undef; } - -d $git_tree or mkdir($git_tree,0777) or die "Could not create $git_tree: $!"; @@ -561,90 +564,67 @@ #--------------------- my $state = 0; -my($patchset,$date,$author_name,$author_email,$branch,$ancestor,$tag,$logmsg); -my(@old,@new,@skipped); -sub commit { - my $pid; - +sub update_index (\@\@) { + my $old = shift; + my $new = shift; open(my $fh, '|-', qw(git-update-index --index-info)) or die "unable to open git-update-index: $!"; print $fh (map { "0 0000000000000000000000000000000000000000\t$_\n" } - @old), + @$old), (map { '100' . sprintf('%o', $_->[0]) . " $_->[1]\t$_->[2]\n" } - @new) + @$new) or die "unable to write to git-update-index: $!"; close $fh or die "unable to write to git-update-index: $!"; $? and die "git-update-index reported error: $?"; - @old = @new = (); +} - $pid = open(C,"-|"); - die "Cannot fork: $!" unless defined $pid; - unless($pid) { - exec("git-write-tree"); - die "Cannot exec git-write-tree: $!\n"; - } - chomp(my $tree = <C>); - length($tree) == 40 - or die "Cannot get tree id ($tree): $!\n"; - close(C) +sub write_tree () { + open(my $fh, '-|', qw(git-write-tree)) + or die "unable to open git-write-tree: $!"; + chomp(my $tree = <$fh>); + is_sha1($tree) + or die "Cannot get tree id ($tree): $!"; + close($fh) or die "Error running git-write-tree: $?\n"; print "Tree ID $tree\n" if $opt_v; + return $tree; +} - my $parent = ""; - if(open(C,"$git_dir/refs/heads/$last_branch")) { - chomp($parent = <C>); - close(C); - length($parent) == 40 - or die "Cannot get parent id ($parent): $!\n"; - print "Parent ID $parent\n" if $opt_v; - } - - my $pr = IO::Pipe->new() or die "Cannot open pipe: $!\n"; - my $pw = IO::Pipe->new() or die "Cannot open pipe: $!\n"; - $pid = fork(); - die "Fork: $!\n" unless defined $pid; - unless($pid) { - $pr->writer(); - $pw->reader(); - open(OUT,">&STDOUT"); - dup2($pw->fileno(),0); - dup2($pr->fileno(),1); - $pr->close(); - $pw->close(); - - my @par = (); - @par = ("-p",$parent) if $parent; - - # loose detection of merges - # based on the commit msg - foreach my $rx (@mergerx) { - if ($logmsg =~ $rx) { - my $mparent = $1; - if ($mparent eq 'HEAD') { $mparent = $opt_o }; - if ( -e "$git_dir/refs/heads/$mparent") { - $mparent = get_headref($mparent, $git_dir); - push @par, '-p', $mparent; - print OUT "Merge parent branch: $mparent\n" if $opt_v; - } - } +my($patchset,$date,$author_name,$author_email,$branch,$ancestor,$tag,$logmsg); +my(@old,@new,@skipped); +sub commit { + update_index(@old, @new); + @old = @new = (); + my $tree = write_tree(); + my $parent = get_headref($last_branch, $git_dir); + print "Parent ID " . ($parent ? $parent : "(empty)") . "\n" if $opt_v; + + my @commit_args; + push @commit_args, ("-p", $parent) if $parent; + + # loose detection of merges + # based on the commit msg + foreach my $rx (@mergerx) { + next unless $logmsg =~ $rx && $1; + my $mparent = $1 eq 'HEAD' ? $opt_o : $1; + if(my $sha1 = get_headref($mparent, $git_dir)) { + push @commit_args, '-p', $mparent; + print "Merge parent branch: $mparent\n" if $opt_v; } - - exec("env", - "GIT_AUTHOR_NAME=$author_name", - "GIT_AUTHOR_EMAIL=$author_email", - "GIT_AUTHOR_DATE=".strftime("+0000 %Y-%m-%d %H:%M:%S",gmtime($date)), - "GIT_COMMITTER_NAME=$author_name", - "GIT_COMMITTER_EMAIL=$author_email", - "GIT_COMMITTER_DATE=".strftime("+0000 %Y-%m-%d %H:%M:%S",gmtime($date)), - "git-commit-tree", $tree,@par); - die "Cannot exec git-commit-tree: $!\n"; - - close OUT; } - $pw->writer(); - $pr->reader(); + + my $commit_date = strftime("+0000 %Y-%m-%d %H:%M:%S",gmtime($date)); + my $pid = open2(my $commit_read, my $commit_write, + 'env', + "GIT_AUTHOR_NAME=$author_name", + "GIT_AUTHOR_EMAIL=$author_email", + "GIT_AUTHOR_DATE=$commit_date", + "GIT_COMMITTER_NAME=$author_name", + "GIT_COMMITTER_EMAIL=$author_email", + "GIT_COMMITTER_DATE=$commit_date", + 'git-commit-tree', $tree, @commit_args); # compatibility with git2cvs substr($logmsg,32767) = "" if length($logmsg) > 32767; @@ -656,16 +636,14 @@ sub commit { @skipped = (); } - print $pw "$logmsg\n" + print($commit_write "$logmsg\n") && close($commit_write) or die "Error writing to git-commit-tree: $!\n"; - $pw->close(); - print "Committed patch $patchset ($branch ".strftime("%Y-%m-%d %H:%M:%S",gmtime($date)).")\n" if $opt_v; - chomp(my $cid = <$pr>); - length($cid) == 40 - or die "Cannot get commit id ($cid): $!\n"; + print "Committed patch $patchset ($branch $commit_date)\n" if $opt_v; + chomp(my $cid = <$commit_read>); + is_sha1($cid) or die "Cannot get commit id ($cid): $!\n"; print "Commit ID $cid\n" if $opt_v; - $pr->close(); + close($commit_read); waitpid($pid,0); die "Error running git-commit-tree: $?\n" if $?; -- 1.3.3.gcb64-dirty ^ permalink raw reply related [flat|nested] 83+ messages in thread
[parent not found: <7v4pzh6wtr.fsf@assigned-by-dhcp.cox.net>]
* Re: [PATCH 2/2] cvsimport: cleanup commit function [not found] ` <7v4pzh6wtr.fsf@assigned-by-dhcp.cox.net> @ 2006-05-23 7:13 ` Jeff King 0 siblings, 0 replies; 83+ messages in thread From: Jeff King @ 2006-05-23 7:13 UTC (permalink / raw) To: Junio C Hamano; +Cc: git [cc'd to list to get reactions on open2] On Tue, May 23, 2006 at 12:10:08AM -0700, Junio C Hamano wrote: > > + return $s =~ /^[a-zA-Z0-9]{40}$/; > [0-9a-f] (We always do lowercase). Er, yes, that was a complete think-o on my part. > Hmm. I personally do not have problems with open2, but folks on > some other platforms might. I'll see how the list audience > sounds. FWIW, it was already being used in git-cvsimport. -Peff ^ permalink raw reply [flat|nested] 83+ messages in thread
* [PATCH 1/2] cvsimport: use git-update-index --index-info 2006-05-23 7:00 ` [PATCH 2/2] cvsimport: cleanup commit function Jeff King [not found] ` <7v4pzh6wtr.fsf@assigned-by-dhcp.cox.net> @ 2006-05-23 7:27 ` Jeff King 2006-05-23 8:13 ` [PATCH 2/2] cvsimport: cleanup commit function Martin Langhoff 2006-05-23 17:47 ` Morten Welinder 3 siblings, 0 replies; 83+ messages in thread From: Jeff King @ 2006-05-23 7:27 UTC (permalink / raw) To: git; +Cc: martin, junkio This should reduce the number of git-update-index forks required per commit. We now do adds/removes in one call, and we are no longer forced to deal with argv limitations. --- This is a repost using -z/NUL instead of line feeds. d82d215430ae5e79210f73a31f5f8a053f36c27f git-cvsimport.perl | 36 +++++++++++++----------------------- 1 files changed, 13 insertions(+), 23 deletions(-) d82d215430ae5e79210f73a31f5f8a053f36c27f diff --git a/git-cvsimport.perl b/git-cvsimport.perl index d257e66..a65bea6 100755 --- a/git-cvsimport.perl +++ b/git-cvsimport.perl @@ -565,29 +565,19 @@ my($patchset,$date,$author_name,$author_ my(@old,@new,@skipped); sub commit { my $pid; - while(@old) { - my @o2; - if(@old > 55) { - @o2 = splice(@old,0,50); - } else { - @o2 = @old; - @old = (); - } - system("git-update-index","--force-remove","--",@o2); - die "Cannot remove files: $?\n" if $?; - } - while(@new) { - my @n2; - if(@new > 12) { - @n2 = splice(@new,0,10); - } else { - @n2 = @new; - @new = (); - } - system("git-update-index","--add", - (map { ('--cacheinfo', @$_) } @n2)); - die "Cannot add files: $?\n" if $?; - } + + open(my $fh, '|-', qw(git-update-index -z --index-info)) + or die "unable to open git-update-index: $!"; + print $fh + (map { "0 0000000000000000000000000000000000000000\t$_\0" } + @old), + (map { '100' . sprintf('%o', $_->[0]) . " $_->[1]\t$_->[2]\0" } + @new) + or die "unable to write to git-update-index: $!"; + close $fh + or die "unable to write to git-update-index: $!"; + $? and die "git-update-index reported error: $?"; + @old = @new = (); $pid = open(C,"-|"); die "Cannot fork: $!" unless defined $pid; -- 1.3.3.g3408 ^ permalink raw reply related [flat|nested] 83+ messages in thread
* Re: [PATCH 2/2] cvsimport: cleanup commit function 2006-05-23 7:00 ` [PATCH 2/2] cvsimport: cleanup commit function Jeff King [not found] ` <7v4pzh6wtr.fsf@assigned-by-dhcp.cox.net> 2006-05-23 7:27 ` [PATCH 1/2] cvsimport: use git-update-index --index-info Jeff King @ 2006-05-23 8:13 ` Martin Langhoff 2006-05-23 8:24 ` Junio C Hamano 2006-05-23 16:50 ` Linus Torvalds 2006-05-23 17:47 ` Morten Welinder 3 siblings, 2 replies; 83+ messages in thread From: Martin Langhoff @ 2006-05-23 8:13 UTC (permalink / raw) To: Martin Langhoff, Junio C Hamano, Matthias Urlichs, git Jeff, good stuff -- aiming at exactly the things that had been nagging me. Some minor notes on top of what junio's mentioned... > + die "unable to open $f: $!" unless $! == POSIX::ENOENT; > + return undef; Heh. Is that the return of the living dead? > +sub update_index (\@\@) { > + my $old = shift; > + my $new = shift; Would it not make more sense to just pass them as plain parameters? > + print "Committed patch $patchset ($branch $commit_date)\n" if Given that we have that -- should we remember it and avoid re-reading the headref from disk? A %seenheads cache would save us 99.9% of the hassle. In related news, I've dealt with file reads from the socket being memorybound. Should merge ok. cheers, martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH 2/2] cvsimport: cleanup commit function 2006-05-23 8:13 ` [PATCH 2/2] cvsimport: cleanup commit function Martin Langhoff @ 2006-05-23 8:24 ` Junio C Hamano 2006-05-23 20:32 ` Martin Langhoff 2006-05-23 16:50 ` Linus Torvalds 1 sibling, 1 reply; 83+ messages in thread From: Junio C Hamano @ 2006-05-23 8:24 UTC (permalink / raw) To: git "Martin Langhoff" <martin.langhoff@gmail.com> writes: > Jeff, > > good stuff -- aiming at exactly the things that had been nagging me. > Some minor notes on top of what junio's mentioned... > >> + die "unable to open $f: $!" unless $! == POSIX::ENOENT; >> + return undef; > > Heh. Is that the return of the living dead? Note the trailing "unless" there. >> +sub update_index (\@\@) { >> + my $old = shift; >> + my $new = shift; > > Would it not make more sense to just pass them as plain parameters? Meaning...? Perl5 can pass only one flat array, so the above is a standard way to pass two arrays. >> + print "Committed patch $patchset ($branch $commit_date)\n" if > > Given that we have that -- should we remember it and avoid re-reading > the headref from disk? A %seenheads cache would save us 99.9% of the > hassle. > > In related news, I've dealt with file reads from the socket being > memorybound. Should merge ok. Merged OK, and I think your last suggestion makes sense. I'll go to bed after pushing out Jeff's two patches and yours. ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH 2/2] cvsimport: cleanup commit function 2006-05-23 8:24 ` Junio C Hamano @ 2006-05-23 20:32 ` Martin Langhoff 0 siblings, 0 replies; 83+ messages in thread From: Martin Langhoff @ 2006-05-23 20:32 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 5/23/06, Junio C Hamano <junkio@cox.net> wrote: > "Martin Langhoff" <martin.langhoff@gmail.com> writes: > > > Jeff, > > > > good stuff -- aiming at exactly the things that had been nagging me. > > Some minor notes on top of what junio's mentioned... > > > >> + die "unable to open $f: $!" unless $! == POSIX::ENOENT; > >> + return undef; > > > > Heh. Is that the return of the living dead? > > Note the trailing "unless" there. Of course. I had actually missed the closing quotes, and thought the error msg wanted to talk about POSIX. 'twas late in the day, seems like most of my comments in this email were rather stoopid. > >> +sub update_index (\@\@) { > >> + my $old = shift; > >> + my $new = shift; > > > > Would it not make more sense to just pass them as plain parameters? > > Meaning...? Perl5 can pass only one flat array, so the above is > a standard way to pass two arrays. Meaning I am stupid :( > >> + print "Committed patch $patchset ($branch $commit_date)\n" if > > > > Given that we have that -- should we remember it and avoid re-reading > > the headref from disk? A %seenheads cache would save us 99.9% of the > > hassle. > > > > In related news, I've dealt with file reads from the socket being > > memorybound. Should merge ok. > > Merged OK, and I think your last suggestion makes sense. I'll > go to bed after pushing out Jeff's two patches and yours. I'll look into caching headrefs tonight if noone beats me to it. martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH 2/2] cvsimport: cleanup commit function 2006-05-23 8:13 ` [PATCH 2/2] cvsimport: cleanup commit function Martin Langhoff 2006-05-23 8:24 ` Junio C Hamano @ 2006-05-23 16:50 ` Linus Torvalds 2006-05-23 19:36 ` Linus Torvalds 1 sibling, 1 reply; 83+ messages in thread From: Linus Torvalds @ 2006-05-23 16:50 UTC (permalink / raw) To: Martin Langhoff; +Cc: Junio C Hamano, Matthias Urlichs, git Hmm. Is it just me, or does the current "git cvsimport" have new problems: [torvalds@merom git]$ git cvsimport -d ~/CVS gentoo-x86 causes Committing initial tree 34bd3dcd4bfd79bad35ce3fb08b2e21108195db8 Server has gone away while fetching BUGS-TODO 1.1, retrying... Retry failed at /home/torvalds/bin/git-cvsimport line 366, <GEN2656> line 9. and that's it for the import. I don't see what would have caused it in the changes, but it definitely worked earlier.. Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH 2/2] cvsimport: cleanup commit function 2006-05-23 16:50 ` Linus Torvalds @ 2006-05-23 19:36 ` Linus Torvalds 2006-05-23 20:25 ` Junio C Hamano 2006-05-23 20:29 ` Martin Langhoff 0 siblings, 2 replies; 83+ messages in thread From: Linus Torvalds @ 2006-05-23 19:36 UTC (permalink / raw) To: Martin Langhoff; +Cc: Junio C Hamano, Matthias Urlichs, git On Tue, 23 May 2006, Linus Torvalds wrote: > > Hmm. Is it just me, or does the current "git cvsimport" have new problems: > > [torvalds@merom git]$ git cvsimport -d ~/CVS gentoo-x86 > > causes > > Committing initial tree 34bd3dcd4bfd79bad35ce3fb08b2e21108195db8 > Server has gone away while fetching BUGS-TODO 1.1, retrying... > Retry failed at /home/torvalds/bin/git-cvsimport line 366, <GEN2656> line 9. > > and that's it for the import. > > I don't see what would have caused it in the changes, but it definitely > worked earlier.. Martin, that problem seems to go away when I initialize $res to 0 in _fetchfile. I don't know perl, and maybe local variables are pre-initialized to empty. It's entirely possible that the fact that it now seems to work for me is purely timing-related, since I also ended up using "-P cvsps-output" to avoid having a huge cvsps binary in memory at the same time. Linus "perl illiterate" Torvalds ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH 2/2] cvsimport: cleanup commit function 2006-05-23 19:36 ` Linus Torvalds @ 2006-05-23 20:25 ` Junio C Hamano 2006-05-23 20:29 ` Martin Langhoff 1 sibling, 0 replies; 83+ messages in thread From: Junio C Hamano @ 2006-05-23 20:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: >> Committing initial tree 34bd3dcd4bfd79bad35ce3fb08b2e21108195db8 >> Server has gone away while fetching BUGS-TODO 1.1, retrying... >... > Martin, that problem seems to go away when I initialize $res to 0 in > _fetchfile. > > I don't know perl, and maybe local variables are pre-initialized to empty. When a new file that is empty is created, sub _line would call sub _fetchfile with $cnt == 0, and it can return $res which is initialized to 'undef'. That explains why sub file says $self->_line() returned an undef and I think what you did is the right fix. ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH 2/2] cvsimport: cleanup commit function 2006-05-23 19:36 ` Linus Torvalds 2006-05-23 20:25 ` Junio C Hamano @ 2006-05-23 20:29 ` Martin Langhoff 2006-05-23 21:10 ` Jeff King 1 sibling, 1 reply; 83+ messages in thread From: Martin Langhoff @ 2006-05-23 20:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, Matthias Urlichs, git On 5/24/06, Linus Torvalds <torvalds@osdl.org> wrote: > Martin, that problem seems to go away when I initialize $res to 0 in > _fetchfile. > > I don't know perl, and maybe local variables are pre-initialized to empty. > > It's entirely possible that the fact that it now seems to work for me is > purely timing-related, since I also ended up using "-P cvsps-output" to > avoid having a huge cvsps binary in memory at the same time. Strange! Cannot repro here with v5.8.8 (debian/etch 5.8.8-4) but initialising it doesn't hurt, so let's do it: diff --git a/git-cvsimport.perl b/git-cvsimport.perl index ace7087..abbfd0b 100755 --- a/git-cvsimport.perl +++ b/git-cvsimport.perl @@ -371,7 +371,7 @@ sub file { } sub _fetchfile { my ($self, $fh, $cnt) = @_; - my $res; + my $res = 0; my $bufsize = 1024 * 1024; while($cnt) { if ($bufsize > $cnt) { cheers, martin ^ permalink raw reply related [flat|nested] 83+ messages in thread
* Re: [PATCH 2/2] cvsimport: cleanup commit function 2006-05-23 20:29 ` Martin Langhoff @ 2006-05-23 21:10 ` Jeff King 2006-05-23 21:13 ` Martin Langhoff 0 siblings, 1 reply; 83+ messages in thread From: Jeff King @ 2006-05-23 21:10 UTC (permalink / raw) To: Martin Langhoff; +Cc: Linus Torvalds, Junio C Hamano, Matthias Urlichs, git On Wed, May 24, 2006 at 08:29:07AM +1200, Martin Langhoff wrote: > Strange! Cannot repro here with v5.8.8 (debian/etch 5.8.8-4) but > initialising it doesn't hurt, so let's do it: I can reproduce with debian perl 5.8.8-4. The bug is only triggered by 0-length files, so presumably your test repo doesn't have any. -Peff ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH 2/2] cvsimport: cleanup commit function 2006-05-23 21:10 ` Jeff King @ 2006-05-23 21:13 ` Martin Langhoff 0 siblings, 0 replies; 83+ messages in thread From: Martin Langhoff @ 2006-05-23 21:13 UTC (permalink / raw) To: Martin Langhoff, Linus Torvalds, Junio C Hamano, Matthias Urlichs, git On 5/24/06, Jeff King <peff@peff.net> wrote: > On Wed, May 24, 2006 at 08:29:07AM +1200, Martin Langhoff wrote: > > > Strange! Cannot repro here with v5.8.8 (debian/etch 5.8.8-4) but > > initialising it doesn't hurt, so let's do it: > > I can reproduce with debian perl 5.8.8-4. The bug is only triggered by > 0-length files, so presumably your test repo doesn't have any. Given that we are all working off the gentoo repo here, it means that my machine is slower than Linus' unreleased Intel box. And that I am too impatient... In any case, the fix is correct as Junio points out. cheers, martin ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH 2/2] cvsimport: cleanup commit function 2006-05-23 7:00 ` [PATCH 2/2] cvsimport: cleanup commit function Jeff King ` (2 preceding siblings ...) 2006-05-23 8:13 ` [PATCH 2/2] cvsimport: cleanup commit function Martin Langhoff @ 2006-05-23 17:47 ` Morten Welinder 2006-05-23 20:59 ` Jeff King 3 siblings, 1 reply; 83+ messages in thread From: Morten Welinder @ 2006-05-23 17:47 UTC (permalink / raw) To: Martin Langhoff, Junio C Hamano, Matthias Urlichs, git Why run "env" and not just muck with %ENV? M. > + my $pid = open2(my $commit_read, my $commit_write, > + 'env', > + "GIT_AUTHOR_NAME=$author_name", > + "GIT_AUTHOR_EMAIL=$author_email", > + "GIT_AUTHOR_DATE=$commit_date", > + "GIT_COMMITTER_NAME=$author_name", > + "GIT_COMMITTER_EMAIL=$author_email", > + "GIT_COMMITTER_DATE=$commit_date", > + 'git-commit-tree', $tree, @commit_args); ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH 2/2] cvsimport: cleanup commit function 2006-05-23 17:47 ` Morten Welinder @ 2006-05-23 20:59 ` Jeff King 2006-05-23 23:41 ` Junio C Hamano 0 siblings, 1 reply; 83+ messages in thread From: Jeff King @ 2006-05-23 20:59 UTC (permalink / raw) To: Morten Welinder; +Cc: Martin Langhoff, Junio C Hamano, Matthias Urlichs, git On Tue, May 23, 2006 at 01:47:01PM -0400, Morten Welinder wrote: > Why run "env" and not just muck with %ENV? > >+ my $pid = open2(my $commit_read, my $commit_write, > >+ 'env', > >+ "GIT_AUTHOR_NAME=$author_name", > >+ "GIT_AUTHOR_EMAIL=$author_email", > >+ "GIT_AUTHOR_DATE=$commit_date", > >+ "GIT_COMMITTER_NAME=$author_name", > >+ "GIT_COMMITTER_EMAIL=$author_email", > >+ "GIT_COMMITTER_DATE=$commit_date", > >+ 'git-commit-tree', $tree, @commit_args); Oops, that's an obvious fork optimization that I should have caught. Patch is below. Note that this will now affect the environment of all sub-processes, but it shouldn't matter since we reset it right before commit. However, if anyone is worried, we can stash the old %ENV in another hash temporarily. -Peff PS What is the preferred format for throwing patches into replies like this? Putting the patch at the end (as here) or throwing the reply comments in the ignored section near the diffstat? --- cvsimport: set up commit environment in perl instead of using env --- 44c4a9f67322302ca49146a7c143c07ea67da366 git-cvsimport.perl | 13 ++++++------- 1 files changed, 6 insertions(+), 7 deletions(-) 44c4a9f67322302ca49146a7c143c07ea67da366 diff --git a/git-cvsimport.perl b/git-cvsimport.perl index 41ee9a6..83d7d3c 100755 --- a/git-cvsimport.perl +++ b/git-cvsimport.perl @@ -618,14 +618,13 @@ sub commit { } my $commit_date = strftime("+0000 %Y-%m-%d %H:%M:%S",gmtime($date)); + $ENV{GIT_AUTHOR_NAME} = $author_name; + $ENV{GIT_AUTHOR_EMAIL} = $author_email; + $ENV{GIT_AUTHOR_DATE} = $commit_date; + $ENV{GIT_COMMITTER_NAME} = $author_name; + $ENV{GIT_COMMITTER_EMAIL} = $author_email; + $ENV{GIT_COMMITTER_DATE} = $commit_date; my $pid = open2(my $commit_read, my $commit_write, - 'env', - "GIT_AUTHOR_NAME=$author_name", - "GIT_AUTHOR_EMAIL=$author_email", - "GIT_AUTHOR_DATE=$commit_date", - "GIT_COMMITTER_NAME=$author_name", - "GIT_COMMITTER_EMAIL=$author_email", - "GIT_COMMITTER_DATE=$commit_date", 'git-commit-tree', $tree, @commit_args); # compatibility with git2cvs -- 1.3.3.g40505-dirty > - > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 83+ messages in thread
* Re: [PATCH 2/2] cvsimport: cleanup commit function 2006-05-23 20:59 ` Jeff King @ 2006-05-23 23:41 ` Junio C Hamano 2006-05-24 9:52 ` Jeff King 0 siblings, 1 reply; 83+ messages in thread From: Junio C Hamano @ 2006-05-23 23:41 UTC (permalink / raw) To: Jeff King; +Cc: Morten Welinder, Martin Langhoff, Matthias Urlichs, git Jeff King <peff@peff.net> writes: > On Tue, May 23, 2006 at 01:47:01PM -0400, Morten Welinder wrote: > >> Why run "env" and not just muck with %ENV? >> >+ my $pid = open2(my $commit_read, my $commit_write, >> >+ 'env', >> >+ "GIT_AUTHOR_NAME=$author_name", >> >+ "GIT_AUTHOR_EMAIL=$author_email", >> >+ "GIT_AUTHOR_DATE=$commit_date", >> >+ "GIT_COMMITTER_NAME=$author_name", >> >+ "GIT_COMMITTER_EMAIL=$author_email", >> >+ "GIT_COMMITTER_DATE=$commit_date", >> >+ 'git-commit-tree', $tree, @commit_args); > > Oops, that's an obvious fork optimization that I should have caught. Are you two talking about running git-commit-tree via env is two fork-execs instead of just one? Does that have a measurable difference? Not that I have anything against the updated code, but I do not particularly thing it is such a big issue. > PS What is the preferred format for throwing patches into replies like > this? Putting the patch at the end (as here) or throwing the reply > comments in the ignored section near the diffstat? You could do it either way. Although I personally find the former easier to read (meshes well with "do not top post" mantra), it appears many other people finds the cover letter material should come after the first '---' separator. If you append the patch to your message, btw, you would need to realize that the receiving end needs to edit your message to remove the top part before running "git am" to apply. ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH 2/2] cvsimport: cleanup commit function 2006-05-23 23:41 ` Junio C Hamano @ 2006-05-24 9:52 ` Jeff King 0 siblings, 0 replies; 83+ messages in thread From: Jeff King @ 2006-05-24 9:52 UTC (permalink / raw) To: Junio C Hamano; +Cc: Morten Welinder, Martin Langhoff, Matthias Urlichs, git On Tue, May 23, 2006 at 04:41:33PM -0700, Junio C Hamano wrote: > Are you two talking about running git-commit-tree via env is two > fork-execs instead of just one? Does that have a measurable > difference? Yes, that's what I was talking about. No, probably not a huge difference. I did some performance measurements of all of the recent cvsimport changes on a small-ish personal repo (I don't have the gentoo repo). The results were not significant (<= 1% improvement for each change). I would expect some of the changes (index-info, fetchfile) to have an impact on a repo with different characteristics (like the gentoo one). -Peff ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-22 19:09 ` Donnie Berkholz 2006-05-22 19:38 ` Linus Torvalds 2006-05-22 19:41 ` Martin Langhoff @ 2006-05-22 20:16 ` Donnie Berkholz 2 siblings, 0 replies; 83+ messages in thread From: Donnie Berkholz @ 2006-05-22 20:16 UTC (permalink / raw) To: Donnie Berkholz Cc: Martin Langhoff, Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs, Johannes Schindelin [-- Attachment #1: Type: text/plain, Size: 652 bytes --] Donnie Berkholz wrote: > OK, I started a new run without -L, and I'm watching it in top right > now. Tried a run with -L 1024 and it broke in just a couple of minutes: Fetching sys-kernel/linux/files/2.4.0.8/linux-2.4.0-ac8-reiserfs-3.6.25-nfs.diff.gz v 1.1 New sys-kernel/linux/files/2.4.0.8/linux-2.4.0-ac8-reiserfs-3.6.25-nfs.diff.gz: 6367 bytes Tree ID 457f629df10e70a5ef430f431eca27ed02a83d46 Parent ID 0541d8b54a02df3be50d529497236556c6862a4c Committed patch 1024 (origin 2001-01-13 00:29:39) Commit ID ba9d995d12a37502a851e198b67e141623f79544 DONE; creating master branch cat: write error: Broken pipe Thanks, Donnie [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-20 22:45 ` Linus Torvalds 2006-05-20 23:12 ` Donnie Berkholz @ 2006-05-21 9:46 ` Thomas Glanzmann 1 sibling, 0 replies; 83+ messages in thread From: Thomas Glanzmann @ 2006-05-21 9:46 UTC (permalink / raw) To: Linus Torvalds; +Cc: Donnie Berkholz, Yann Dirson, Git Mailing List Hello Linus, > and I'm a humanitarian - rescuing people from CVS is > to me not just a good idea, it's a moral imperative. you're a very brave man. Thomas ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-20 22:18 ` Donnie Berkholz 2006-05-20 22:45 ` Linus Torvalds @ 2006-05-21 1:14 ` Donnie Berkholz 1 sibling, 0 replies; 83+ messages in thread From: Donnie Berkholz @ 2006-05-21 1:14 UTC (permalink / raw) To: Donnie Berkholz; +Cc: Yann Dirson, Linus Torvalds, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 542 bytes --] Donnie Berkholz wrote: > Somebody else tried importing it with git-cvsimport, but he said he hit > some kind of problem and recalled that it was a cvsps segfault. Sounds > about right, since I've never gotten cvsps to run successfully on the > whole repo either. Much to my surprise, a cvsps run I started earlier has just finished without segfaulting. But attempts to actually run cvsps (e.g., cvsps -a spyderous) spit thousands of warnings of "WARNING: revision 1.1.1.1 of file $FILENAME on unnamed branch". Thanks, Donnie [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: irc usage.. 2006-05-20 20:39 ` Yann Dirson 2006-05-20 22:18 ` Donnie Berkholz @ 2006-05-22 1:45 ` Linus Torvalds 1 sibling, 0 replies; 83+ messages in thread From: Linus Torvalds @ 2006-05-22 1:45 UTC (permalink / raw) To: Yann Dirson; +Cc: Git Mailing List On Sat, 20 May 2006, Yann Dirson wrote: > > FWIW, I have mentionned a problem that may be the same, under > Message-ID <20060107090148.GB32585@nowhere.earth>, that was on January > 7th. Namely, when importing a repository with very large files over > pserver or ssh, timeouts can occur and prevent the import from > working. But, as you said, it's not easy to get precise info from the > logs :) For big repositories, you really shouldn't use pserver or ssh anyway. You should try really really hard to just get a local copy, and do it that way. It's going to be tons faster, and will avoid a lot of the problems, including network timeouts etc. Linus ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <1148369266352-git-send-email-1>]
* [PATCH 2/2] cvsimport: cleanup commit function [not found] <1148369266352-git-send-email-1> @ 2006-05-23 7:27 ` Jeff King 0 siblings, 0 replies; 83+ messages in thread From: Jeff King @ 2006-05-23 7:27 UTC (permalink / raw) To: git; +Cc: martin, junkio This change attempts to clean up the commit function to make it a bit easier to read (or at least the first half of it). It also improves robustness and performance. Specifically: - report get_headref errors on opening ref unless the error is ENOENT - use regex to check for sha1 instead of length - use lexically scoped filehandles which get cleaned up automagically - check for error on both 'print' and 'close' (since output is buffered) - avoid "fork, do some perl, then exec" in commit(). It's not necessary, and we probably end up COW'ing parts of the perl process. Plus the code is much smaller because we can use open2() - avoid calling strftime over and over (mainly a readability cleanup) --- This is a repost with some minor fixups from Junio (and based off of the fixed 1/2 patch). 3408c8d8364f816a7c4a34a03045f466bf028540 git-cvsimport.perl | 150 ++++++++++++++++++++++------------------------------ 1 files changed, 64 insertions(+), 86 deletions(-) 3408c8d8364f816a7c4a34a03045f466bf028540 diff --git a/git-cvsimport.perl b/git-cvsimport.perl index a65bea6..219f6dc 100755 --- a/git-cvsimport.perl +++ b/git-cvsimport.perl @@ -23,7 +23,7 @@ use File::Basename qw(basename dirname); use Time::Local; use IO::Socket; use IO::Pipe; -use POSIX qw(strftime dup2); +use POSIX qw(strftime dup2 :errno_h); use IPC::Open2; $SIG{'PIPE'}="IGNORE"; @@ -429,22 +429,25 @@ sub getwd() { return $pwd; } +sub is_sha1 { + my $s = shift; + return $s =~ /^[a-f0-9]{40}$/; +} -sub get_headref($$) { +sub get_headref ($$) { my $name = shift; my $git_dir = shift; - my $sha; - if (open(C,"$git_dir/refs/heads/$name")) { - chomp($sha = <C>); - close(C); - length($sha) == 40 - or die "Cannot get head id for $name ($sha): $!\n"; + my $f = "$git_dir/refs/heads/$name"; + if(open(my $fh, $f)) { + chomp(my $r = <$fh>); + is_sha1($r) or die "Cannot get head id for $name ($r): $!"; + return $r; } - return $sha; + die "unable to open $f: $!" unless $! == POSIX::ENOENT; + return undef; } - -d $git_tree or mkdir($git_tree,0777) or die "Could not create $git_tree: $!"; @@ -561,90 +564,67 @@ #--------------------- my $state = 0; -my($patchset,$date,$author_name,$author_email,$branch,$ancestor,$tag,$logmsg); -my(@old,@new,@skipped); -sub commit { - my $pid; - +sub update_index (\@\@) { + my $old = shift; + my $new = shift; open(my $fh, '|-', qw(git-update-index -z --index-info)) or die "unable to open git-update-index: $!"; print $fh (map { "0 0000000000000000000000000000000000000000\t$_\0" } - @old), + @$old), (map { '100' . sprintf('%o', $_->[0]) . " $_->[1]\t$_->[2]\0" } - @new) + @$new) or die "unable to write to git-update-index: $!"; close $fh or die "unable to write to git-update-index: $!"; $? and die "git-update-index reported error: $?"; - @old = @new = (); +} - $pid = open(C,"-|"); - die "Cannot fork: $!" unless defined $pid; - unless($pid) { - exec("git-write-tree"); - die "Cannot exec git-write-tree: $!\n"; - } - chomp(my $tree = <C>); - length($tree) == 40 - or die "Cannot get tree id ($tree): $!\n"; - close(C) +sub write_tree () { + open(my $fh, '-|', qw(git-write-tree)) + or die "unable to open git-write-tree: $!"; + chomp(my $tree = <$fh>); + is_sha1($tree) + or die "Cannot get tree id ($tree): $!"; + close($fh) or die "Error running git-write-tree: $?\n"; print "Tree ID $tree\n" if $opt_v; + return $tree; +} - my $parent = ""; - if(open(C,"$git_dir/refs/heads/$last_branch")) { - chomp($parent = <C>); - close(C); - length($parent) == 40 - or die "Cannot get parent id ($parent): $!\n"; - print "Parent ID $parent\n" if $opt_v; - } - - my $pr = IO::Pipe->new() or die "Cannot open pipe: $!\n"; - my $pw = IO::Pipe->new() or die "Cannot open pipe: $!\n"; - $pid = fork(); - die "Fork: $!\n" unless defined $pid; - unless($pid) { - $pr->writer(); - $pw->reader(); - open(OUT,">&STDOUT"); - dup2($pw->fileno(),0); - dup2($pr->fileno(),1); - $pr->close(); - $pw->close(); - - my @par = (); - @par = ("-p",$parent) if $parent; - - # loose detection of merges - # based on the commit msg - foreach my $rx (@mergerx) { - if ($logmsg =~ $rx) { - my $mparent = $1; - if ($mparent eq 'HEAD') { $mparent = $opt_o }; - if ( -e "$git_dir/refs/heads/$mparent") { - $mparent = get_headref($mparent, $git_dir); - push @par, '-p', $mparent; - print OUT "Merge parent branch: $mparent\n" if $opt_v; - } - } +my($patchset,$date,$author_name,$author_email,$branch,$ancestor,$tag,$logmsg); +my(@old,@new,@skipped); +sub commit { + update_index(@old, @new); + @old = @new = (); + my $tree = write_tree(); + my $parent = get_headref($last_branch, $git_dir); + print "Parent ID " . ($parent ? $parent : "(empty)") . "\n" if $opt_v; + + my @commit_args; + push @commit_args, ("-p", $parent) if $parent; + + # loose detection of merges + # based on the commit msg + foreach my $rx (@mergerx) { + next unless $logmsg =~ $rx && $1; + my $mparent = $1 eq 'HEAD' ? $opt_o : $1; + if(my $sha1 = get_headref($mparent, $git_dir)) { + push @commit_args, '-p', $mparent; + print "Merge parent branch: $mparent\n" if $opt_v; } - - exec("env", - "GIT_AUTHOR_NAME=$author_name", - "GIT_AUTHOR_EMAIL=$author_email", - "GIT_AUTHOR_DATE=".strftime("+0000 %Y-%m-%d %H:%M:%S",gmtime($date)), - "GIT_COMMITTER_NAME=$author_name", - "GIT_COMMITTER_EMAIL=$author_email", - "GIT_COMMITTER_DATE=".strftime("+0000 %Y-%m-%d %H:%M:%S",gmtime($date)), - "git-commit-tree", $tree,@par); - die "Cannot exec git-commit-tree: $!\n"; - - close OUT; } - $pw->writer(); - $pr->reader(); + + my $commit_date = strftime("+0000 %Y-%m-%d %H:%M:%S",gmtime($date)); + my $pid = open2(my $commit_read, my $commit_write, + 'env', + "GIT_AUTHOR_NAME=$author_name", + "GIT_AUTHOR_EMAIL=$author_email", + "GIT_AUTHOR_DATE=$commit_date", + "GIT_COMMITTER_NAME=$author_name", + "GIT_COMMITTER_EMAIL=$author_email", + "GIT_COMMITTER_DATE=$commit_date", + 'git-commit-tree', $tree, @commit_args); # compatibility with git2cvs substr($logmsg,32767) = "" if length($logmsg) > 32767; @@ -656,16 +636,14 @@ sub commit { @skipped = (); } - print $pw "$logmsg\n" + print($commit_write "$logmsg\n") && close($commit_write) or die "Error writing to git-commit-tree: $!\n"; - $pw->close(); - print "Committed patch $patchset ($branch ".strftime("%Y-%m-%d %H:%M:%S",gmtime($date)).")\n" if $opt_v; - chomp(my $cid = <$pr>); - length($cid) == 40 - or die "Cannot get commit id ($cid): $!\n"; + print "Committed patch $patchset ($branch $commit_date)\n" if $opt_v; + chomp(my $cid = <$commit_read>); + is_sha1($cid) or die "Cannot get commit id ($cid): $!\n"; print "Commit ID $cid\n" if $opt_v; - $pr->close(); + close($commit_read); waitpid($pid,0); die "Error running git-commit-tree: $?\n" if $?; -- 1.3.3.g3408 ^ permalink raw reply related [flat|nested] 83+ messages in thread
end of thread, other threads:[~2006-06-05 16:08 UTC | newest]
Thread overview: 83+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-20 17:26 irc usage Linus Torvalds
2006-05-20 17:50 ` Junio C Hamano
2006-05-20 18:52 ` Jakub Narebski
2006-05-20 20:39 ` Yann Dirson
2006-05-20 22:18 ` Donnie Berkholz
2006-05-20 22:45 ` Linus Torvalds
2006-05-20 23:12 ` Donnie Berkholz
2006-05-21 19:24 ` Linus Torvalds
2006-05-22 3:59 ` Linus Torvalds
2006-05-22 4:19 ` Donnie Berkholz
2006-05-22 4:50 ` Linus Torvalds
2006-05-22 5:04 ` Martin Langhoff
2006-05-22 5:21 ` Donnie Berkholz
2006-05-22 7:42 ` Martin Langhoff
2006-05-22 9:13 ` Linus Torvalds
2006-05-22 12:54 ` Martin Langhoff
2006-05-22 17:27 ` Linus Torvalds
2006-05-22 17:51 ` Jakub Narebski
2006-05-22 18:03 ` Linus Torvalds
2006-05-22 19:03 ` Matthias Lederhofer
2006-05-22 19:09 ` Junio C Hamano
2006-05-23 20:19 ` Jakub Narebski
2006-05-22 19:46 ` Martin Langhoff
2006-05-22 19:09 ` Donnie Berkholz
2006-05-22 19:38 ` Linus Torvalds
2006-05-22 19:49 ` Donnie Berkholz
2006-05-22 20:20 ` Linus Torvalds
2006-05-22 21:48 ` Donnie Berkholz
2006-05-29 21:54 ` Donnie Berkholz
2006-05-29 22:21 ` Martin Langhoff
2006-05-29 22:32 ` Donnie Berkholz
2006-05-30 0:19 ` Martin Langhoff
2006-05-30 5:31 ` Donnie Berkholz
2006-05-30 6:01 ` Martin Langhoff
2006-05-30 0:43 ` Linus Torvalds
2006-05-30 22:31 ` Martin Langhoff
2006-05-30 23:07 ` Linus Torvalds
2006-05-31 1:04 ` Martin Langhoff
2006-05-31 2:49 ` Donnie Berkholz
2006-05-31 6:05 ` Martin Langhoff
2006-05-31 13:54 ` Alec Warner
2006-05-31 22:03 ` Martin Langhoff
2006-06-01 1:42 ` Alec Warner
2006-06-01 7:47 ` Martin Langhoff
2006-06-05 0:33 ` Alec Warner
2006-06-05 2:06 ` Martin Langhoff
2006-06-05 2:36 ` Alec Warner
2006-06-05 3:49 ` Martin Langhoff
[not found] ` <20060605120743.566fb85f.seanlkml@sympatico.ca>
2006-06-05 16:07 ` Sean
2006-05-22 19:41 ` Martin Langhoff
2006-05-22 20:11 ` Linus Torvalds
2006-05-22 20:33 ` Linus Torvalds
2006-05-22 21:41 ` Matthias Urlichs
2006-05-22 22:18 ` Linus Torvalds
2006-05-22 23:23 ` Martin Langhoff
2006-05-22 23:29 ` Martin Langhoff
2006-05-22 23:33 ` Linus Torvalds
2006-05-22 22:39 ` Junio C Hamano
2006-05-22 23:15 ` Martin Langhoff
2006-05-23 6:52 ` Jeff King
2006-05-23 6:58 ` Jeff King
2006-05-23 7:01 ` [PATCH 1/2] cvsimport: use git-update-index --index-info Jeff King
2006-05-23 7:00 ` [PATCH 2/2] cvsimport: cleanup commit function Jeff King
[not found] ` <7v4pzh6wtr.fsf@assigned-by-dhcp.cox.net>
2006-05-23 7:13 ` Jeff King
2006-05-23 7:27 ` [PATCH 1/2] cvsimport: use git-update-index --index-info Jeff King
2006-05-23 8:13 ` [PATCH 2/2] cvsimport: cleanup commit function Martin Langhoff
2006-05-23 8:24 ` Junio C Hamano
2006-05-23 20:32 ` Martin Langhoff
2006-05-23 16:50 ` Linus Torvalds
2006-05-23 19:36 ` Linus Torvalds
2006-05-23 20:25 ` Junio C Hamano
2006-05-23 20:29 ` Martin Langhoff
2006-05-23 21:10 ` Jeff King
2006-05-23 21:13 ` Martin Langhoff
2006-05-23 17:47 ` Morten Welinder
2006-05-23 20:59 ` Jeff King
2006-05-23 23:41 ` Junio C Hamano
2006-05-24 9:52 ` Jeff King
2006-05-22 20:16 ` irc usage Donnie Berkholz
2006-05-21 9:46 ` Thomas Glanzmann
2006-05-21 1:14 ` Donnie Berkholz
2006-05-22 1:45 ` Linus Torvalds
[not found] <1148369266352-git-send-email-1>
2006-05-23 7:27 ` [PATCH 2/2] cvsimport: cleanup commit function Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).