* git-svn should default to --repack @ 2008-01-18 12:17 Kevin Ballard 2008-01-18 15:56 ` Karl Hasselström 0 siblings, 1 reply; 30+ messages in thread From: Kevin Ballard @ 2008-01-18 12:17 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 790 bytes --] I was very surprised to find that git-svn does not in fact default to --repack. I firmly believe it should. Here's an example as to why it should. I used git-svn to import a repository with 33000 revisions and about 7500 files. It took about 18 hours to import. When it was done, my .git folder had 242001 files that comprised 2.0GB. I ran `git gc -- agressive --prune` and let that sit overnight (I wish it was more verbose, it went for over an hour without printing anything), and that managed to compress the repo down to 334 files and 64MB. Now I have to figure out how to delete the .git folder from my regular backups. http://skitch.com/kballard/r7mn/results-of-git-gc-ono-macports-repo -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 2432 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-svn should default to --repack 2008-01-18 12:17 git-svn should default to --repack Kevin Ballard @ 2008-01-18 15:56 ` Karl Hasselström 2008-01-18 20:44 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Karl Hasselström @ 2008-01-18 15:56 UTC (permalink / raw) To: Kevin Ballard; +Cc: git On 2008-01-18 07:17:55 -0500, Kevin Ballard wrote: > I was very surprised to find that git-svn does not in fact default > to --repack. I firmly believe it should. I believe so too. And nowadays there's "git gc --auto", which was made for occasions such as this, so it should be a breeze to implement. The overhead might be low enough that it can be called after _every_ imported revision. -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-svn should default to --repack 2008-01-18 15:56 ` Karl Hasselström @ 2008-01-18 20:44 ` Junio C Hamano 2008-01-19 12:35 ` Karl Hasselström 0 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2008-01-18 20:44 UTC (permalink / raw) To: Karl Hasselström; +Cc: Kevin Ballard, git Karl Hasselström <kha@treskal.com> writes: > On 2008-01-18 07:17:55 -0500, Kevin Ballard wrote: > >> I was very surprised to find that git-svn does not in fact default >> to --repack. I firmly believe it should. > > I believe so too. And nowadays there's "git gc --auto", which was made > for occasions such as this, so it should be a breeze to implement. The > overhead might be low enough that it can be called after _every_ > imported revision. Careful. I made the same mistake and it had to be corrected with e0cd252eb0ba6453acd64762625b004aa4cc162b. "gc --auto" after every 1000 or so feels like a good default and I would agree that would be a real fix to a real usability bug. Patches? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-svn should default to --repack 2008-01-18 20:44 ` Junio C Hamano @ 2008-01-19 12:35 ` Karl Hasselström 2008-01-19 15:05 ` Kevin Ballard 2008-01-19 22:36 ` [PATCH] Let "git svn" run "git gc --auto" occasionally Karl Hasselström 0 siblings, 2 replies; 30+ messages in thread From: Karl Hasselström @ 2008-01-19 12:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: Kevin Ballard, git On 2008-01-18 12:44:08 -0800, Junio C Hamano wrote: > Karl Hasselström <kha@treskal.com> writes: > > > I believe so too. And nowadays there's "git gc --auto", which was > > made for occasions such as this, so it should be a breeze to > > implement. The overhead might be low enough that it can be called > > after _every_ imported revision. > > Careful. I made the same mistake and it had to be corrected with > e0cd252eb0ba6453acd64762625b004aa4cc162b. > > "gc --auto" after every 1000 or so feels like a good default and I > would agree that would be a real fix to a real usability bug. I think 1000 might be too high; considering that (at least in my experience) it takes on the order of 250-500 ms to import a commit, the gc --auto overhead of maybe 10 ms isn't so bad. A good compromise might be to run gc --auto after every 10-100 commits, _and_ when the import is done. However, if gc --auto always takes a lot of time without accomplishing anything in the presence of too many unreachable loose objects it might not be a good idea to run it at all, since the use of git-svn involves frequent rebasing. > Patches? Just hot air and noise for now from my end. Sorry. -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-svn should default to --repack 2008-01-19 12:35 ` Karl Hasselström @ 2008-01-19 15:05 ` Kevin Ballard 2008-01-19 22:36 ` [PATCH] Let "git svn" run "git gc --auto" occasionally Karl Hasselström 1 sibling, 0 replies; 30+ messages in thread From: Kevin Ballard @ 2008-01-19 15:05 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 1812 bytes --] Note: CC list pruned as, once again, my Mail client decided to send the original message as HTML and it got bounced from the list. Original CC list: kha@treskal.com, gitster@pobox.com On Jan 19, 2008, at 7:35 AM, Karl Hasselström wrote: > On 2008-01-18 12:44:08 -0800, Junio C Hamano wrote: > >> Karl Hasselström <kha@treskal.com> writes: >> >>> I believe so too. And nowadays there's "git gc --auto", which was >>> made for occasions such as this, so it should be a breeze to >>> implement. The overhead might be low enough that it can be called >>> after _every_ imported revision. >> >> Careful. I made the same mistake and it had to be corrected with >> e0cd252eb0ba6453acd64762625b004aa4cc162b. >> >> "gc --auto" after every 1000 or so feels like a good default and I >> would agree that would be a real fix to a real usability bug. > > I think 1000 might be too high; considering that (at least in my > experience) it takes on the order of 250-500 ms to import a commit, > the gc --auto overhead of maybe 10 ms isn't so bad. > > A good compromise might be to run gc --auto after every 10-100 > commits, _and_ when the import is done. > > However, if gc --auto always takes a lot of time without accomplishing > anything in the presence of too many unreachable loose objects it > might not be a good idea to run it at all, since the use of git-svn > involves frequent rebasing. I don't know much about how this works, so if git gc --auto might have a problem, it seems the simplest fix for now would be to default git- svn to having --repack=1000 on. >> Patches? > > Just hot air and noise for now from my end. Sorry. Same. I don't know Perl. Sorry. -Kevin Ballard -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 2432 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] Let "git svn" run "git gc --auto" occasionally 2008-01-19 12:35 ` Karl Hasselström 2008-01-19 15:05 ` Kevin Ballard @ 2008-01-19 22:36 ` Karl Hasselström 2008-01-19 22:50 ` Harvey Harrison 1 sibling, 1 reply; 30+ messages in thread From: Karl Hasselström @ 2008-01-19 22:36 UTC (permalink / raw) To: Eric Wong; +Cc: git, Kevin Ballard, Junio C Hamano Let "git svn" run "git gc --auto" every 100 imported commits, to reduce the number of loose objects. To handle the common use case of frequent imports, where each invocation typically fetches less than 100 commits, randomly set the counter to something in the range 1-100 on initialization. It's almost as good as saving the counter, and much less of a hassle. Oh, and 100 is just my best guess at a reasonable number. It could conceivably need tweaking. Signed-off-by: Karl Hasselström <kha@treskal.com> --- On 2008-01-19 13:35:57 +0100, Karl Hasselström wrote: > On 2008-01-18 12:44:08 -0800, Junio C Hamano wrote: > > > Patches? > > Just hot air and noise for now from my end. Sorry. OK, it didn't feel good saying that. So here's my attempt at being a model citizen. (It's not hard with a change this small ...) I'm not quite sure how this should interact with the --repack flag. Right now they just coexist, except for never running right after one another, but conceivably we should do something cleverer. Eric? git-svn.perl | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/git-svn.perl b/git-svn.perl index 9f2b587..89e1d61 100755 --- a/git-svn.perl +++ b/git-svn.perl @@ -1247,7 +1247,7 @@ use File::Path qw/mkpath/; use File::Copy qw/copy/; use IPC::Open3; -my $_repack_nr; +my ($_repack_nr, $_gc_nr, $_gc_period); # properties that we do not log: my %SKIP_PROP; BEGIN { @@ -1413,6 +1413,8 @@ sub init_vars { $_repack_nr = $_repack; $_repack_flags ||= '-d'; } + $_gc_period = 100; + $_gc_nr = int(rand($_gc_period)) + 1; } sub verify_remotes_sanity { @@ -2157,6 +2159,9 @@ sub do_git_commit { print "Running git repack $_repack_flags ...\n"; command_noisy('repack', split(/\s+/, $_repack_flags)); print "Done repacking\n"; + } elsif (--$_gc_nr == 0) { + $_gc_nr = $_gc_period; + command_noisy('gc', '--auto'); } return $commit; } ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH] Let "git svn" run "git gc --auto" occasionally 2008-01-19 22:36 ` [PATCH] Let "git svn" run "git gc --auto" occasionally Karl Hasselström @ 2008-01-19 22:50 ` Harvey Harrison 2008-01-20 3:37 ` Eric Wong 0 siblings, 1 reply; 30+ messages in thread From: Harvey Harrison @ 2008-01-19 22:50 UTC (permalink / raw) To: Karl Hasselström; +Cc: Eric Wong, git, Kevin Ballard, Junio C Hamano On Sat, 2008-01-19 at 23:36 +0100, Karl Hasselström wrote: > Let "git svn" run "git gc --auto" every 100 imported commits, to > reduce the number of loose objects. I found 100 was a bit too low when doing some large repos, I've been using 1000. I'd argue that --repack=1000 should be done by default. > I'm not quite sure how this should interact with the --repack flag. > Right now they just coexist, except for never running right after one > another, but conceivably we should do something cleverer. Eric? > How about git gc always gets run at the very end of a git svn fetch? Just a thought. Harvey ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Let "git svn" run "git gc --auto" occasionally 2008-01-19 22:50 ` Harvey Harrison @ 2008-01-20 3:37 ` Eric Wong 2008-01-20 9:34 ` Karl Hasselström 0 siblings, 1 reply; 30+ messages in thread From: Eric Wong @ 2008-01-20 3:37 UTC (permalink / raw) To: Harvey Harrison; +Cc: Karl Hasselström, git, Kevin Ballard, Junio C Hamano Harvey Harrison <harvey.harrison@gmail.com> wrote: > On Sat, 2008-01-19 at 23:36 +0100, Karl Hasselström wrote: > > Let "git svn" run "git gc --auto" every 100 imported commits, to > > reduce the number of loose objects. > > I found 100 was a bit too low when doing some large repos, I've > been using 1000. I'd argue that --repack=1000 should be done by > default. I've found 100 for repack too low in the past, too, which is why repack defaults to 1000 if no number is specified. I think it should hold for gc --auto, too. > > I'm not quite sure how this should interact with the --repack flag. > > Right now they just coexist, except for never running right after one > > another, but conceivably we should do something cleverer. Eric? I consider --repack is out-of-date now that we have gc --auto. I'm in favor of ripping out repack support in git-svn and just using gc --auto. > How about git gc always gets run at the very end of a git svn fetch? I'd much prefer that we run gc --auto at the end of every fetch instead of doing so randomly for small fetches. -- Eric Wong ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Let "git svn" run "git gc --auto" occasionally 2008-01-20 3:37 ` Eric Wong @ 2008-01-20 9:34 ` Karl Hasselström 2008-01-20 19:17 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Karl Hasselström @ 2008-01-20 9:34 UTC (permalink / raw) To: Eric Wong; +Cc: Harvey Harrison, git, Kevin Ballard, Junio C Hamano On 2008-01-19 19:37:37 -0800, Eric Wong wrote: > Harvey Harrison <harvey.harrison@gmail.com> wrote: > > > I found 100 was a bit too low when doing some large repos, I've > > been using 1000. I'd argue that --repack=1000 should be done by > > default. > > I've found 100 for repack too low in the past, too, which is why > repack defaults to 1000 if no number is specified. I think it should > hold for gc --auto, too. OK, I'll change it. But remember, gc --auto doesn't do _anything_ unless it's deemed necessary, so it should behave much better than just plain repack. In theory at least. > I consider --repack is out-of-date now that we have gc --auto. I'm > in favor of ripping out repack support in git-svn and just using gc > --auto. Will do. What should I do with the repack commadline options? Keep them for backwards compatibility but ignore them? > > How about git gc always gets run at the very end of a git svn > > fetch? > > I'd much prefer that we run gc --auto at the end of every fetch > instead of doing so randomly for small fetches. OK, will do. I'll just have to find a good spot to call it from. Hints welcome. -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Let "git svn" run "git gc --auto" occasionally 2008-01-20 9:34 ` Karl Hasselström @ 2008-01-20 19:17 ` Junio C Hamano 2008-01-21 22:48 ` Eric Wong 2008-01-20 21:39 ` [PATCH 1/2] git-svn: Don't call git-repack anymore Karl Hasselström 2008-01-20 21:40 ` [PATCH 2/2] Let "git svn" run "git gc --auto" occasionally Karl Hasselström 2 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2008-01-20 19:17 UTC (permalink / raw) To: Karl Hasselström; +Cc: Eric Wong, Harvey Harrison, git, Kevin Ballard Karl Hasselström <kha@treskal.com> writes: > On 2008-01-19 19:37:37 -0800, Eric Wong wrote: > >> Harvey Harrison <harvey.harrison@gmail.com> wrote: >> >> > I found 100 was a bit too low when doing some large repos, I've >> > been using 1000. I'd argue that --repack=1000 should be done by >> > default. >> >> I've found 100 for repack too low in the past, too, which is why >> repack defaults to 1000 if no number is specified. I think it should >> hold for gc --auto, too. > > OK, I'll change it. But remember, gc --auto doesn't do _anything_ > unless it's deemed necessary, so it should behave much better than > just plain repack. In theory at least. Careful. I made the same mistake and it had to be corrected with e0cd252eb0ba6453acd64762625b004aa4cc162b. I think defaulting to --repack=1000 is a sane first step and you guys already have most code for it so that is a very safe thing. Switching to "gc --auto" can be done early post 1.5.4, right? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Let "git svn" run "git gc --auto" occasionally 2008-01-20 19:17 ` Junio C Hamano @ 2008-01-21 22:48 ` Eric Wong 2008-01-22 0:30 ` Junio C Hamano 2008-02-03 16:55 ` [PATCH 0/2] "git svn" and "git gc --auto" Karl Hasselström 0 siblings, 2 replies; 30+ messages in thread From: Eric Wong @ 2008-01-21 22:48 UTC (permalink / raw) To: Junio C Hamano; +Cc: Karl Hasselström, Harvey Harrison, git, Kevin Ballard Junio C Hamano <gitster@pobox.com> wrote: > Karl Hasselström <kha@treskal.com> writes: > > > On 2008-01-19 19:37:37 -0800, Eric Wong wrote: > > > >> Harvey Harrison <harvey.harrison@gmail.com> wrote: > >> > >> > I found 100 was a bit too low when doing some large repos, I've > >> > been using 1000. I'd argue that --repack=1000 should be done by > >> > default. > >> > >> I've found 100 for repack too low in the past, too, which is why > >> repack defaults to 1000 if no number is specified. I think it should > >> hold for gc --auto, too. > > > > OK, I'll change it. But remember, gc --auto doesn't do _anything_ > > unless it's deemed necessary, so it should behave much better than > > just plain repack. In theory at least. > > Careful. I made the same mistake and it had to be corrected with > e0cd252eb0ba6453acd64762625b004aa4cc162b. > > I think defaulting to --repack=1000 is a sane first step and you > guys already have most code for it so that is a very safe thing. > > Switching to "gc --auto" can be done early post 1.5.4, right? Sorry for the latency[1], ack on both of Karl's patches for post-1.5.4. Here's a conservative change for 1.5.4 (not at all tested): From dbccd8081c6422569a9ca1211e27f56a24fdf3f3 Mon Sep 17 00:00:00 2001 From: Eric Wong <normalperson@yhbt.net> Date: Mon, 21 Jan 2008 14:37:41 -0800 Subject: [PATCH] git-svn: default to repacking every 1000 commits This should reduce disk space usage when doing large imports. We'll be switching to "gc --auto" post-1.5.4 to handle repacking for us. Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-svn.perl | 8 +++----- 1 files changed, 3 insertions(+), 5 deletions(-) diff --git a/git-svn.perl b/git-svn.perl index 9f2b587..12745d5 100755 --- a/git-svn.perl +++ b/git-svn.perl @@ -1408,11 +1408,9 @@ sub read_all_remotes { } sub init_vars { - if (defined $_repack) { - $_repack = 1000 if ($_repack <= 0); - $_repack_nr = $_repack; - $_repack_flags ||= '-d'; - } + $_repack = 1000 unless (defined $_repack && $_repack > 0); + $_repack_nr = $_repack; + $_repack_flags ||= '-d'; } sub verify_remotes_sanity { -- Eric Wong [1] - I've been busy with other things and will also be traveling this week, too. ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH] Let "git svn" run "git gc --auto" occasionally 2008-01-21 22:48 ` Eric Wong @ 2008-01-22 0:30 ` Junio C Hamano 2008-01-22 0:39 ` Eric Wong 2008-02-03 16:55 ` [PATCH 0/2] "git svn" and "git gc --auto" Karl Hasselström 1 sibling, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2008-01-22 0:30 UTC (permalink / raw) To: Eric Wong; +Cc: Karl Hasselström, Harvey Harrison, git, Kevin Ballard Eric Wong <normalperson@yhbt.net> writes: > Here's a conservative change for 1.5.4 (not at all tested): > > From dbccd8081c6422569a9ca1211e27f56a24fdf3f3 Mon Sep 17 00:00:00 2001 > From: Eric Wong <normalperson@yhbt.net> > Date: Mon, 21 Jan 2008 14:37:41 -0800 > Subject: [PATCH] git-svn: default to repacking every 1000 commits > > This should reduce disk space usage when doing large imports. > We'll be switching to "gc --auto" post-1.5.4 to handle > repacking for us. > > Signed-off-by: Eric Wong <normalperson@yhbt.net> > --- > git-svn.perl | 8 +++----- > 1 files changed, 3 insertions(+), 5 deletions(-) > > diff --git a/git-svn.perl b/git-svn.perl > index 9f2b587..12745d5 100755 > --- a/git-svn.perl > +++ b/git-svn.perl > @@ -1408,11 +1408,9 @@ sub read_all_remotes { > } > > sub init_vars { > - if (defined $_repack) { > - $_repack = 1000 if ($_repack <= 0); > - $_repack_nr = $_repack; > - $_repack_flags ||= '-d'; > - } > + $_repack = 1000 unless (defined $_repack && $_repack > 0); > + $_repack_nr = $_repack; > + $_repack_flags ||= '-d'; > } > > sub verify_remotes_sanity { Thanks, but I think you need to do something about this part: 2154: if (defined $_repack && (--$_repack_nr == 0)) { I'd say if ($_repack && (--$_repack_nr == 0)) { ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Let "git svn" run "git gc --auto" occasionally 2008-01-22 0:30 ` Junio C Hamano @ 2008-01-22 0:39 ` Eric Wong 2008-01-22 1:52 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Eric Wong @ 2008-01-22 0:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: Karl Hasselström, Harvey Harrison, git, Kevin Ballard Junio C Hamano <gitster@pobox.com> wrote: > Eric Wong <normalperson@yhbt.net> writes: > > > Here's a conservative change for 1.5.4 (not at all tested): > > > > From dbccd8081c6422569a9ca1211e27f56a24fdf3f3 Mon Sep 17 00:00:00 2001 > > From: Eric Wong <normalperson@yhbt.net> > > Date: Mon, 21 Jan 2008 14:37:41 -0800 > > Subject: [PATCH] git-svn: default to repacking every 1000 commits > > > > This should reduce disk space usage when doing large imports. > > We'll be switching to "gc --auto" post-1.5.4 to handle > > repacking for us. > > > > Signed-off-by: Eric Wong <normalperson@yhbt.net> > > --- > > git-svn.perl | 8 +++----- > > 1 files changed, 3 insertions(+), 5 deletions(-) > > > > diff --git a/git-svn.perl b/git-svn.perl > > index 9f2b587..12745d5 100755 > > --- a/git-svn.perl > > +++ b/git-svn.perl > > @@ -1408,11 +1408,9 @@ sub read_all_remotes { > > } > > > > sub init_vars { > > - if (defined $_repack) { > > - $_repack = 1000 if ($_repack <= 0); > > - $_repack_nr = $_repack; > > - $_repack_flags ||= '-d'; > > - } > > + $_repack = 1000 unless (defined $_repack && $_repack > 0); > > + $_repack_nr = $_repack; > > + $_repack_flags ||= '-d'; > > } > > > > sub verify_remotes_sanity { > > Thanks, but I think you need to do something about this part: > > 2154: if (defined $_repack && (--$_repack_nr == 0)) { > > I'd say > > if ($_repack && (--$_repack_nr == 0)) { init_vars() is called unconditionally, and always defines $_repack. It could actually just be: if (--$_repack_nr == 0) { -- Eric Wong ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Let "git svn" run "git gc --auto" occasionally 2008-01-22 0:39 ` Eric Wong @ 2008-01-22 1:52 ` Junio C Hamano 2008-01-23 2:43 ` git filter-branch should run git gc --auto Kevin Ballard 0 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2008-01-22 1:52 UTC (permalink / raw) To: Eric Wong; +Cc: Karl Hasselström, Harvey Harrison, git, Kevin Ballard Eric Wong <normalperson@yhbt.net> writes: >> > sub init_vars { >> > - if (defined $_repack) { >> > - $_repack = 1000 if ($_repack <= 0); >> > - $_repack_nr = $_repack; >> > - $_repack_flags ||= '-d'; >> > - } >> > + $_repack = 1000 unless (defined $_repack && $_repack > 0); >> > + $_repack_nr = $_repack; >> > + $_repack_flags ||= '-d'; >> > } >> > >> > sub verify_remotes_sanity { >> >> Thanks, but I think you need to do something about this part: >> >> 2154: if (defined $_repack && (--$_repack_nr == 0)) { >> >> I'd say >> >> if ($_repack && (--$_repack_nr == 0)) { > > init_vars() is called unconditionally, and always defines $_repack. > It could actually just be: > > if (--$_repack_nr == 0) { But that means predecremented --$_repack_nr will count -1, -2, ... until it wraps around when the user said "--repack=0", meaning "never repack". Instead you made it "do not repack for a many many many rounds". Which would be perfectly fine in practice but somehow feels a bit dirty to me. ^ permalink raw reply [flat|nested] 30+ messages in thread
* git filter-branch should run git gc --auto 2008-01-22 1:52 ` Junio C Hamano @ 2008-01-23 2:43 ` Kevin Ballard 2008-01-23 2:46 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Kevin Ballard @ 2008-01-23 2:43 UTC (permalink / raw) To: Git Mailing List [-- Attachment #1: Type: text/plain, Size: 996 bytes --] I just glanced at git-filter-branch.sh (and I must say I was incredibly surprised to find out it was a shell script) and it seems it never runs git-gc or git-repack. Doesn't that end up with the same problems as git-svn sans git-repack when filtering a large number of commits? I was just thinking, if I were to git-filter-branch on my massive repo (in fact, the same repo that started this thread, with over 33000 commits in the upstream svn repo), even if I just do something as simple as change the commit msg wont I end up with thousands of unreachable objects? I shudder to think how many unreachable objects I would have if I pruned the entire dports directory off of the tree. Am I missing something, or does git-filter-branch really not do any garbage collection? I tried reading the source, but complex bash scripts are almost as bad as perl in terms of readability. -Kevin Ballard -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 2432 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git filter-branch should run git gc --auto 2008-01-23 2:43 ` git filter-branch should run git gc --auto Kevin Ballard @ 2008-01-23 2:46 ` Junio C Hamano 2008-01-23 2:52 ` Junio C Hamano ` (3 more replies) 0 siblings, 4 replies; 30+ messages in thread From: Junio C Hamano @ 2008-01-23 2:46 UTC (permalink / raw) To: Kevin Ballard; +Cc: Git Mailing List Kevin Ballard <kevin@sb.org> writes: > I just glanced at git-filter-branch.sh (and I must say I was > incredibly surprised to find out it was a shell script) and it seems > it never runs git-gc or git-repack. Doesn't that end up with the same > problems as git-svn sans git-repack when filtering a large number of > commits? I was just thinking, if I were to git-filter-branch on my > massive repo (in fact, the same repo that started this thread, with > over 33000 commits in the upstream svn repo), even if I just do > something as simple as change the commit msg wont I end up with > thousands of unreachable objects? I shudder to think how many > unreachable objects I would have if I pruned the entire dports > directory off of the tree. > > Am I missing something, or does git-filter-branch really not do any > garbage collection? I tried reading the source, but complex bash > scripts are almost as bad as perl in terms of readability. Theoretically yes, and it largely depends on what you do, but filter-branch goes over the objects that already exists in your repository, and hopefully you won't be rewriting majority of them. So the impact of not repacking is probably much less painful in practice. But again as I said, it largely depends on what you do in your filter. If you are upcasing (or convert to NFD ;-)) the contents of all of your blob objects, you would certainly want to repack every once in a while. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git filter-branch should run git gc --auto 2008-01-23 2:46 ` Junio C Hamano @ 2008-01-23 2:52 ` Junio C Hamano 2008-01-23 3:03 ` Kevin Ballard 2008-01-23 2:54 ` Harvey Harrison ` (2 subsequent siblings) 3 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2008-01-23 2:52 UTC (permalink / raw) To: Kevin Ballard; +Cc: Git Mailing List Junio C Hamano <gitster@pobox.com> writes: > Kevin Ballard <kevin@sb.org> writes: > >> I just glanced at git-filter-branch.sh (and I must say I was >> incredibly surprised to find out it was a shell script) and it seems >> it never runs git-gc or git-repack. Doesn't that end up with the same >> problems as git-svn sans git-repack when filtering a large number of >> commits? I was just thinking, if I were to git-filter-branch on my >> massive repo (in fact, the same repo that started this thread, with >> over 33000 commits in the upstream svn repo), even if I just do >> something as simple as change the commit msg wont I end up with >> thousands of unreachable objects? I shudder to think how many >> unreachable objects I would have if I pruned the entire dports >> directory off of the tree. Another thing I forgot to say in my previous message. The old refs are kept in reflogs and also in refs/original/, so you will not be creating new unreachables even if you rewrite many objects. >> Am I missing something, or does git-filter-branch really not do any >> garbage collection? I tried reading the source, but complex bash >> scripts are almost as bad as perl in terms of readability. > > Theoretically yes, and it largely depends on what you do, but > filter-branch goes over the objects that already exists in your > repository, and hopefully you won't be rewriting majority of > them. > > So the impact of not repacking is probably much less painful in > practice. > > But again as I said, it largely depends on what you do in your > filter. If you are upcasing (or convert to NFD ;-)) the > contents of all of your blob objects, you would certainly want > to repack every once in a while. Something like this, perhaps? git-filter-branch.sh | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index ebf05ca..8e44001 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -299,6 +299,12 @@ while read commit parents; do die "msg filter failed: $filter_msg" sh -c "$filter_commit" "git commit-tree" \ $(git write-tree) $parentstr < ../message > ../map/$commit + + if test $(( $i % 512 )) = 0 + then + git gc --auto + fi + done <../revs # In case of a subdirectory filter, it is possible that a specified head ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: git filter-branch should run git gc --auto 2008-01-23 2:52 ` Junio C Hamano @ 2008-01-23 3:03 ` Kevin Ballard 0 siblings, 0 replies; 30+ messages in thread From: Kevin Ballard @ 2008-01-23 3:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List [-- Attachment #1: Type: text/plain, Size: 2371 bytes --] On Jan 22, 2008, at 9:52 PM, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > >> Kevin Ballard <kevin@sb.org> writes: >> >>> Am I missing something, or does git-filter-branch really not do any >>> garbage collection? I tried reading the source, but complex bash >>> scripts are almost as bad as perl in terms of readability. >> >> Theoretically yes, and it largely depends on what you do, but >> filter-branch goes over the objects that already exists in your >> repository, and hopefully you won't be rewriting majority of >> them. >> >> So the impact of not repacking is probably much less painful in >> practice. >> >> But again as I said, it largely depends on what you do in your >> filter. If you are upcasing (or convert to NFD ;-)) the >> contents of all of your blob objects, you would certainly want >> to repack every once in a while. > > Something like this, perhaps? > > git-filter-branch.sh | 6 ++++++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/git-filter-branch.sh b/git-filter-branch.sh > index ebf05ca..8e44001 100755 > --- a/git-filter-branch.sh > +++ b/git-filter-branch.sh > @@ -299,6 +299,12 @@ while read commit parents; do > die "msg filter failed: $filter_msg" > sh -c "$filter_commit" "git commit-tree" \ > $(git write-tree) $parentstr < ../message > ../map/$commit > + > + if test $(( $i % 512 )) = 0 > + then > + git gc --auto > + fi > + > done <../revs > > # In case of a subdirectory filter, it is possible that a specified > head > Offhand that looks good, but we'd probably want to unilaterally do another git-gc when we're done. diff --git a/git-filter-branch.sh b/git-filter-branch.sh index ebf05ca..32274a6 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -299,8 +299,16 @@ while read commit parents; do die "msg filter failed: $filter_msg" sh -c "$filter_commit" "git commit-tree" \ $(git write-tree) $parentstr < ../message > ../map/$commit + + if test $(( $i % 512 )) = 0 + then + git gc --auto + fi + done <../revs +git gc --auto + # In case of a subdirectory filter, it is possible that a specified head # is not in the set of rewritten commits, because it was pruned by the # revision walker. Fix it by mapping these heads to the next rewritten -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 2432 bytes --] ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: git filter-branch should run git gc --auto 2008-01-23 2:46 ` Junio C Hamano 2008-01-23 2:52 ` Junio C Hamano @ 2008-01-23 2:54 ` Harvey Harrison 2008-01-23 2:58 ` Kevin Ballard 2008-01-23 6:44 ` Mike Hommey 3 siblings, 0 replies; 30+ messages in thread From: Harvey Harrison @ 2008-01-23 2:54 UTC (permalink / raw) To: Junio C Hamano; +Cc: Kevin Ballard, Git Mailing List On Tue, 2008-01-22 at 18:46 -0800, Junio C Hamano wrote: > Kevin Ballard <kevin@sb.org> writes: > > > I just glanced at git-filter-branch.sh (and I must say I was > > incredibly surprised to find out it was a shell script) and it seems > > it never runs git-gc or git-repack. Doesn't that end up with the same > > problems as git-svn sans git-repack when filtering a large number of > > commits? I was just thinking, if I were to git-filter-branch on my > > massive repo (in fact, the same repo that started this thread, with > > over 33000 commits in the upstream svn repo), even if I just do > > something as simple as change the commit msg wont I end up with > > thousands of unreachable objects? I shudder to think how many > > unreachable objects I would have if I pruned the entire dports > > directory off of the tree. > > > > Am I missing something, or does git-filter-branch really not do any > > garbage collection? I tried reading the source, but complex bash > > scripts are almost as bad as perl in terms of readability. > > Theoretically yes, and it largely depends on what you do, but > filter-branch goes over the objects that already exists in your > repository, and hopefully you won't be rewriting majority of > them. > > So the impact of not repacking is probably much less painful in > practice. And afterwards, you'll probably want to check the rewritten history to make sure it is acceptable before doing a git gc --prune. Cheers, Harvey ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git filter-branch should run git gc --auto 2008-01-23 2:46 ` Junio C Hamano 2008-01-23 2:52 ` Junio C Hamano 2008-01-23 2:54 ` Harvey Harrison @ 2008-01-23 2:58 ` Kevin Ballard 2008-01-23 5:07 ` Sam Vilain 2008-01-23 6:44 ` Mike Hommey 3 siblings, 1 reply; 30+ messages in thread From: Kevin Ballard @ 2008-01-23 2:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List [-- Attachment #1: Type: text/plain, Size: 3495 bytes --] On Jan 22, 2008, at 9:46 PM, Junio C Hamano wrote: > Kevin Ballard <kevin@sb.org> writes: > >> I just glanced at git-filter-branch.sh (and I must say I was >> incredibly surprised to find out it was a shell script) and it seems >> it never runs git-gc or git-repack. Doesn't that end up with the same >> problems as git-svn sans git-repack when filtering a large number of >> commits? I was just thinking, if I were to git-filter-branch on my >> massive repo (in fact, the same repo that started this thread, with >> over 33000 commits in the upstream svn repo), even if I just do >> something as simple as change the commit msg wont I end up with >> thousands of unreachable objects? I shudder to think how many >> unreachable objects I would have if I pruned the entire dports >> directory off of the tree. >> >> Am I missing something, or does git-filter-branch really not do any >> garbage collection? I tried reading the source, but complex bash >> scripts are almost as bad as perl in terms of readability. > > Theoretically yes, and it largely depends on what you do, but > filter-branch goes over the objects that already exists in your > repository, and hopefully you won't be rewriting majority of > them. > > So the impact of not repacking is probably much less painful in > practice. > > But again as I said, it largely depends on what you do in your > filter. If you are upcasing (or convert to NFD ;-)) the > contents of all of your blob objects, you would certainly want > to repack every once in a while. I'm actually considering what the cost would be of switching macports to git (not that it will ever happen - too many anonymous people pull from svn trunk). Right now the svn trunk contains a subfolder for the source code and another subfolder for all ~4400+ Portfiles. In such a theoretical move, I'd want to split that up, probably into two unrelated branches. Doing so would mean running git-filter-branch over a linear commit history that's 31580 objects long, with a tree filter to prune the dports directory away and a msg filter to remove the svn- id stuff that git-svn left behind. This means that every single commit objects would be changed, as well as the root tree object for every single commit. That would be about 63160 objects. I'd also have to figure out some way to remove the commit objects entirely that only reference the dports directory. Then I'd have to do it again with the opposite tree filter (to prune everything but the dports directory and move the contents of the dports directory up one level) and same msg filter. Granted, if I do the first action in a branch, that leaves no unreachable objects (since the originals are still referenced), but the second operation definitely would leave unreachable objects, and were I to clone the repository instead and do the operations in the different repos (which is perfectly legitimate - otherwise I'd have to clone it after everything else and then delete branches) then both actions would leave thousands of objects unreachable. I'd suggest a patch to run git gc --auto, but it looks like you just did in a subsequent email. As for your comments about the reflogs, can't I disable recording those, at least temporarily? I'd rather clean up after myself as I work rather than balloon the repository and collapse it in a single operation at the end. -Kevin Ballard -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 2432 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git filter-branch should run git gc --auto 2008-01-23 2:58 ` Kevin Ballard @ 2008-01-23 5:07 ` Sam Vilain 2008-01-23 8:18 ` Kevin Ballard 0 siblings, 1 reply; 30+ messages in thread From: Sam Vilain @ 2008-01-23 5:07 UTC (permalink / raw) To: Kevin Ballard; +Cc: Junio C Hamano, Git Mailing List Kevin Ballard wrote: > I'm actually considering what the cost would be of switching macports > to git (not that it will ever happen - too many anonymous people pull > from svn trunk). Right now the svn trunk contains a subfolder for the > source code and another subfolder for all ~4400+ Portfiles. In such a > theoretical move, I'd want to split that up, probably into two > unrelated branches. Doing so would mean running git-filter-branch over > a linear commit history that's 31580 objects long, with a tree filter > to prune the dports directory away and a msg filter to remove the svn- > id stuff that git-svn left behind. You could have used git-svn --no-metadata :) Using a commit filter to implement the pruning will be much faster; you'll need to make a temporary index, use git-read-tree, git-rm, then git-commit. This way you avoid the expense of checking out the files just to delete them in your rewrite hook. > I'd also have to > figure out some way to remove the commit objects entirely that only > reference the dports directory. This can be done with a parent filter. > I'd suggest a patch to run git gc --auto, but it looks like you just > did in a subsequent email. As for your comments about the reflogs, > can't I disable recording those, at least temporarily? I'd rather > clean up after myself as I work rather than balloon the repository and > collapse it in a single operation at the end. Honestly, the optimisation I mention above will save you much more time. Note that you can run git-repack -d every half hour out of cron, it is safe and will let it clean as you go. Sam. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git filter-branch should run git gc --auto 2008-01-23 5:07 ` Sam Vilain @ 2008-01-23 8:18 ` Kevin Ballard 0 siblings, 0 replies; 30+ messages in thread From: Kevin Ballard @ 2008-01-23 8:18 UTC (permalink / raw) To: Sam Vilain; +Cc: Junio C Hamano, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 2433 bytes --] On Jan 23, 2008, at 12:07 AM, Sam Vilain wrote: > Kevin Ballard wrote: >> I'm actually considering what the cost would be of switching macports >> to git (not that it will ever happen - too many anonymous people pull >> from svn trunk). Right now the svn trunk contains a subfolder for the >> source code and another subfolder for all ~4400+ Portfiles. In such a >> theoretical move, I'd want to split that up, probably into two >> unrelated branches. Doing so would mean running git-filter-branch >> over >> a linear commit history that's 31580 objects long, with a tree filter >> to prune the dports directory away and a msg filter to remove the >> svn- >> id stuff that git-svn left behind. > > You could have used git-svn --no-metadata :) Sure, except I imported the svn repo with the intention of continuing to track it. I'm only floating the idea now of converting the upstream repo to git, but as I said before we have enough anonymous checkouts of people tracking trunk that we probably can't justify switching VCSs, especially when svn is now bundled on Leopard but git isn't. > Using a commit filter to implement the pruning will be much faster; > you'll need to make a temporary index, use git-read-tree, git-rm, then > git-commit. This way you avoid the expense of checking out the files > just to delete them in your rewrite hook. I suspect an index filter would be simpler, and that's really what I meant when I said tree filter. >> I'd also have to >> figure out some way to remove the commit objects entirely that only >> reference the dports directory. > > This can be done with a parent filter. Good to know. >> I'd suggest a patch to run git gc --auto, but it looks like you just >> did in a subsequent email. As for your comments about the reflogs, >> can't I disable recording those, at least temporarily? I'd rather >> clean up after myself as I work rather than balloon the repository >> and >> collapse it in a single operation at the end. > > Honestly, the optimisation I mention above will save you much more > time. > Note that you can run git-repack -d every half hour out of cron, it is > safe and will let it clean as you go. That's a reasonable suggestion. And I'm still just thinking about this, so I have no idea if I'll ever actually have to run git-filter- branch on this massive history. -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 2432 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git filter-branch should run git gc --auto 2008-01-23 2:46 ` Junio C Hamano ` (2 preceding siblings ...) 2008-01-23 2:58 ` Kevin Ballard @ 2008-01-23 6:44 ` Mike Hommey 2008-01-23 13:00 ` Johannes Schindelin 2008-01-23 19:22 ` Junio C Hamano 3 siblings, 2 replies; 30+ messages in thread From: Mike Hommey @ 2008-01-23 6:44 UTC (permalink / raw) To: Junio C Hamano; +Cc: Kevin Ballard, Git Mailing List On Tue, Jan 22, 2008 at 06:46:52PM -0800, Junio C Hamano wrote: > Kevin Ballard <kevin@sb.org> writes: > > > I just glanced at git-filter-branch.sh (and I must say I was > > incredibly surprised to find out it was a shell script) and it seems > > it never runs git-gc or git-repack. Doesn't that end up with the same > > problems as git-svn sans git-repack when filtering a large number of > > commits? I was just thinking, if I were to git-filter-branch on my > > massive repo (in fact, the same repo that started this thread, with > > over 33000 commits in the upstream svn repo), even if I just do > > something as simple as change the commit msg wont I end up with > > thousands of unreachable objects? I shudder to think how many > > unreachable objects I would have if I pruned the entire dports > > directory off of the tree. > > > > Am I missing something, or does git-filter-branch really not do any > > garbage collection? I tried reading the source, but complex bash > > scripts are almost as bad as perl in terms of readability. > > Theoretically yes, and it largely depends on what you do, but > filter-branch goes over the objects that already exists in your > repository, and hopefully you won't be rewriting majority of > them. > > So the impact of not repacking is probably much less painful in > practice. > > But again as I said, it largely depends on what you do in your > filter. If you are upcasing (or convert to NFD ;-)) the > contents of all of your blob objects, you would certainly want > to repack every once in a while. I wonder if it wouldn't be possible to have filter-branch use fast-import, so that it would create a pack instead of a lot of loose objects. Mike ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git filter-branch should run git gc --auto 2008-01-23 6:44 ` Mike Hommey @ 2008-01-23 13:00 ` Johannes Schindelin 2008-01-23 19:22 ` Junio C Hamano 1 sibling, 0 replies; 30+ messages in thread From: Johannes Schindelin @ 2008-01-23 13:00 UTC (permalink / raw) To: Mike Hommey; +Cc: Junio C Hamano, Kevin Ballard, Git Mailing List Hi, On Wed, 23 Jan 2008, Mike Hommey wrote: > On Tue, Jan 22, 2008 at 06:46:52PM -0800, Junio C Hamano wrote: > > Kevin Ballard <kevin@sb.org> writes: > > > > > I just glanced at git-filter-branch.sh (and I must say I was > > > incredibly surprised to find out it was a shell script) and it seems > > > it never runs git-gc or git-repack. Doesn't that end up with the > > > same problems as git-svn sans git-repack when filtering a large > > > number of commits? I was just thinking, if I were to > > > git-filter-branch on my massive repo (in fact, the same repo that > > > started this thread, with over 33000 commits in the upstream svn > > > repo), even if I just do something as simple as change the commit > > > msg wont I end up with thousands of unreachable objects? I shudder > > > to think how many unreachable objects I would have if I pruned the > > > entire dports directory off of the tree. > > > > > > Am I missing something, or does git-filter-branch really not do any > > > garbage collection? I tried reading the source, but complex bash > > > scripts are almost as bad as perl in terms of readability. > > > > Theoretically yes, and it largely depends on what you do, but > > filter-branch goes over the objects that already exists in your > > repository, and hopefully you won't be rewriting majority of them. > > > > So the impact of not repacking is probably much less painful in > > practice. > > > > But again as I said, it largely depends on what you do in your filter. > > If you are upcasing (or convert to NFD ;-)) the contents of all of > > your blob objects, you would certainly want to repack every once in a > > while. > > I wonder if it wouldn't be possible to have filter-branch use > fast-import, so that it would create a pack instead of a lot of loose > objects. Not really; the filters are very much tuned to the index-modification and commit process. And I doubt that the gc --auto would help much; git-filter-branch creates gazillions of files, and that is likely to bring performance down. If, that is, you choose _not_ to heed the comment in Documentation/git-filter-branch.txt lines 44-46: Note that since this operation is extensively I/O expensive, it might be a good idea to redirect the temporary directory off-disk with the '-d' option, e.g. on tmpfs. Reportedly the speedup is very noticeable. Ciao, Dscho ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git filter-branch should run git gc --auto 2008-01-23 6:44 ` Mike Hommey 2008-01-23 13:00 ` Johannes Schindelin @ 2008-01-23 19:22 ` Junio C Hamano 1 sibling, 0 replies; 30+ messages in thread From: Junio C Hamano @ 2008-01-23 19:22 UTC (permalink / raw) To: Mike Hommey; +Cc: Kevin Ballard, Git Mailing List Mike Hommey <mh@glandium.org> writes: > I wonder if it wouldn't be possible to have filter-branch use > fast-import, so that it would create a pack instead of a lot of loose > objects. I do not think it will help. The objects in packs fast-import creates cannot be accessed from outside fast-import. Not even the rest of the core routines running inside that fast-import process cannot access them via the usual read_sha1_file() interface, as described in detail in a recent thread [*1*]. The only way to make it available while you are still feeding new data to fast-import is to explicitly tell it to finalize the current pack by issuing a 'mark' command (and fast-import will start writing to a new pack). And filters need to be able to read the objects previous steps produced to do their work. Which means that instead of having to deal with many loose objects, you will now face many little packs, each contains data changed perhaps at most one commit's worth. You would need to "repack -a -d" to consolidate these little packs every once in a while, and I suspect more often than you would need to repack loose objects, as handling many packs is much more expensive than handling many loose objects. [Reference] *1* http://thread.gmane.org/gmane.comp.version-control.git/70964/focus=71076 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 0/2] "git svn" and "git gc --auto" 2008-01-21 22:48 ` Eric Wong 2008-01-22 0:30 ` Junio C Hamano @ 2008-02-03 16:55 ` Karl Hasselström 2008-02-03 16:56 ` [PATCH 1/2] git-svn: Don't call git-repack anymore Karl Hasselström 2008-02-03 16:56 ` [PATCH 2/2] Let "git svn" run "git gc --auto" occasionally Karl Hasselström 1 sibling, 2 replies; 30+ messages in thread From: Karl Hasselström @ 2008-02-03 16:55 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Eric Wong On 2008-01-21 14:48:28 -0800, Eric Wong wrote: > Sorry for the latency[1], ack on both of Karl's patches for > post-1.5.4. So here they are again. There was a trivial merge conflict with Eric's fix, but otherwise they are unchanged. --- Karl Hasselström (2): Let "git svn" run "git gc --auto" occasionally git-svn: Don't call git-repack anymore git-svn.perl | 24 ++++++++++++++---------- 1 files changed, 14 insertions(+), 10 deletions(-) -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/2] git-svn: Don't call git-repack anymore 2008-02-03 16:55 ` [PATCH 0/2] "git svn" and "git gc --auto" Karl Hasselström @ 2008-02-03 16:56 ` Karl Hasselström 2008-02-03 16:56 ` [PATCH 2/2] Let "git svn" run "git gc --auto" occasionally Karl Hasselström 1 sibling, 0 replies; 30+ messages in thread From: Karl Hasselström @ 2008-02-03 16:56 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Eric Wong In a moment, we'll start calling git-gc --auto instead, since it is a better fit to what we're trying to accomplish. The command line options are still accepted, but don't have any effect, and we warn the user about that. Signed-off-by: Karl Hasselström <kha@treskal.com> --- git-svn.perl | 14 +++----------- 1 files changed, 3 insertions(+), 11 deletions(-) diff --git a/git-svn.perl b/git-svn.perl index 75e97cc..074068c 100755 --- a/git-svn.perl +++ b/git-svn.perl @@ -1247,7 +1247,6 @@ use File::Path qw/mkpath/; use File::Copy qw/copy/; use IPC::Open3; -my $_repack_nr; # properties that we do not log: my %SKIP_PROP; BEGIN { @@ -1408,9 +1407,9 @@ sub read_all_remotes { } sub init_vars { - $_repack = 1000 unless (defined $_repack && $_repack > 0); - $_repack_nr = $_repack; - $_repack_flags ||= '-d'; + if (defined $_repack || defined $_repack_flags) { + warn "Repack options are obsolete; they have no effect.\n"; + } } sub verify_remotes_sanity { @@ -2149,13 +2148,6 @@ sub do_git_commit { 0, $self->svm_uuid); } print " = $commit ($self->{ref_id})\n"; - if ($_repack && (--$_repack_nr == 0)) { - $_repack_nr = $_repack; - # repack doesn't use any arguments with spaces in them, does it? - print "Running git repack $_repack_flags ...\n"; - command_noisy('repack', split(/\s+/, $_repack_flags)); - print "Done repacking\n"; - } return $commit; } ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/2] Let "git svn" run "git gc --auto" occasionally 2008-02-03 16:55 ` [PATCH 0/2] "git svn" and "git gc --auto" Karl Hasselström 2008-02-03 16:56 ` [PATCH 1/2] git-svn: Don't call git-repack anymore Karl Hasselström @ 2008-02-03 16:56 ` Karl Hasselström 1 sibling, 0 replies; 30+ messages in thread From: Karl Hasselström @ 2008-02-03 16:56 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Eric Wong Let "git svn" run "git gc --auto" every 1000 imported commits to reduce the number of loose objects. To handle the common use case of frequent imports, where each invocation typically fetches much less than 1000 commits, also run gc unconditionally at the end of the import. "1000" is the same number that was used by default when we called git-repack. It isn't necessarily still the best choice. Signed-off-by: Karl Hasselström <kha@treskal.com> --- git-svn.perl | 12 ++++++++++++ 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/git-svn.perl b/git-svn.perl index 074068c..6cc3157 100755 --- a/git-svn.perl +++ b/git-svn.perl @@ -1247,6 +1247,8 @@ use File::Path qw/mkpath/; use File::Copy qw/copy/; use IPC::Open3; +my ($_gc_nr, $_gc_period); + # properties that we do not log: my %SKIP_PROP; BEGIN { @@ -1407,6 +1409,7 @@ sub read_all_remotes { } sub init_vars { + $_gc_nr = $_gc_period = 1000; if (defined $_repack || defined $_repack_flags) { warn "Repack options are obsolete; they have no effect.\n"; } @@ -2095,6 +2098,10 @@ sub restore_commit_header_env { } } +sub gc { + command_noisy('gc', '--auto'); +}; + sub do_git_commit { my ($self, $log_entry) = @_; my $lr = $self->last_rev; @@ -2148,6 +2155,10 @@ sub do_git_commit { 0, $self->svm_uuid); } print " = $commit ($self->{ref_id})\n"; + if (--$_gc_nr == 0) { + $_gc_nr = $_gc_period; + gc(); + } return $commit; } @@ -3975,6 +3986,7 @@ sub gs_fetch_loop_common { $max += $inc; $max = $head if ($max > $head); } + Git::SVN::gc(); } sub match_globs { ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 1/2] git-svn: Don't call git-repack anymore 2008-01-20 9:34 ` Karl Hasselström 2008-01-20 19:17 ` Junio C Hamano @ 2008-01-20 21:39 ` Karl Hasselström 2008-01-20 21:40 ` [PATCH 2/2] Let "git svn" run "git gc --auto" occasionally Karl Hasselström 2 siblings, 0 replies; 30+ messages in thread From: Karl Hasselström @ 2008-01-20 21:39 UTC (permalink / raw) To: Eric Wong; +Cc: git, Harvey Harrison, Kevin Ballard, Junio C Hamano In a moment, we'll start calling git-gc --auto instead, since it is a better fit to what we're trying to accomplish. The command line options are still accepted, but don't have any effect, and we warn the user about that. Signed-off-by: Karl Hasselström <kha@treskal.com> --- Is this close enough to what you intended? git-svn.perl | 14 ++------------ 1 files changed, 2 insertions(+), 12 deletions(-) diff --git a/git-svn.perl b/git-svn.perl index 9f2b587..988d8f6 100755 --- a/git-svn.perl +++ b/git-svn.perl @@ -1247,7 +1247,6 @@ use File::Path qw/mkpath/; use File::Copy qw/copy/; use IPC::Open3; -my $_repack_nr; # properties that we do not log: my %SKIP_PROP; BEGIN { @@ -1408,10 +1407,8 @@ sub read_all_remotes { } sub init_vars { - if (defined $_repack) { - $_repack = 1000 if ($_repack <= 0); - $_repack_nr = $_repack; - $_repack_flags ||= '-d'; + if (defined $_repack || defined $_repack_flags) { + warn "Repack options are obsolete; they have no effect.\n"; } } @@ -2151,13 +2148,6 @@ sub do_git_commit { 0, $self->svm_uuid); } print " = $commit ($self->{ref_id})\n"; - if (defined $_repack && (--$_repack_nr == 0)) { - $_repack_nr = $_repack; - # repack doesn't use any arguments with spaces in them, does it? - print "Running git repack $_repack_flags ...\n"; - command_noisy('repack', split(/\s+/, $_repack_flags)); - print "Done repacking\n"; - } return $commit; } ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/2] Let "git svn" run "git gc --auto" occasionally 2008-01-20 9:34 ` Karl Hasselström 2008-01-20 19:17 ` Junio C Hamano 2008-01-20 21:39 ` [PATCH 1/2] git-svn: Don't call git-repack anymore Karl Hasselström @ 2008-01-20 21:40 ` Karl Hasselström 2 siblings, 0 replies; 30+ messages in thread From: Karl Hasselström @ 2008-01-20 21:40 UTC (permalink / raw) To: Eric Wong; +Cc: git, Harvey Harrison, Kevin Ballard, Junio C Hamano Let "git svn" run "git gc --auto" every 1000 imported commits to reduce the number of loose objects. To handle the common use case of frequent imports, where each invocation typically fetches much less than 1000 commits, also run gc unconditionally at the end of the import. "1000" is the same number that was used by default when we called git-repack. It isn't necessarily still the best choice. Signed-off-by: Karl Hasselström <kha@treskal.com> --- git-svn.perl | 12 ++++++++++++ 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/git-svn.perl b/git-svn.perl index 988d8f6..be4105c 100755 --- a/git-svn.perl +++ b/git-svn.perl @@ -1247,6 +1247,8 @@ use File::Path qw/mkpath/; use File::Copy qw/copy/; use IPC::Open3; +my ($_gc_nr, $_gc_period); + # properties that we do not log: my %SKIP_PROP; BEGIN { @@ -1407,6 +1409,7 @@ sub read_all_remotes { } sub init_vars { + $_gc_nr = $_gc_period = 1000; if (defined $_repack || defined $_repack_flags) { warn "Repack options are obsolete; they have no effect.\n"; } @@ -2095,6 +2098,10 @@ sub restore_commit_header_env { } } +sub gc { + command_noisy('gc', '--auto'); +}; + sub do_git_commit { my ($self, $log_entry) = @_; my $lr = $self->last_rev; @@ -2148,6 +2155,10 @@ sub do_git_commit { 0, $self->svm_uuid); } print " = $commit ($self->{ref_id})\n"; + if (--$_gc_nr == 0) { + $_gc_nr = $_gc_period; + gc(); + } return $commit; } @@ -3975,6 +3986,7 @@ sub gs_fetch_loop_common { $max += $inc; $max = $head if ($max > $head); } + Git::SVN::gc(); } sub match_globs { ^ permalink raw reply related [flat|nested] 30+ messages in thread
end of thread, other threads:[~2008-02-03 16:57 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-01-18 12:17 git-svn should default to --repack Kevin Ballard 2008-01-18 15:56 ` Karl Hasselström 2008-01-18 20:44 ` Junio C Hamano 2008-01-19 12:35 ` Karl Hasselström 2008-01-19 15:05 ` Kevin Ballard 2008-01-19 22:36 ` [PATCH] Let "git svn" run "git gc --auto" occasionally Karl Hasselström 2008-01-19 22:50 ` Harvey Harrison 2008-01-20 3:37 ` Eric Wong 2008-01-20 9:34 ` Karl Hasselström 2008-01-20 19:17 ` Junio C Hamano 2008-01-21 22:48 ` Eric Wong 2008-01-22 0:30 ` Junio C Hamano 2008-01-22 0:39 ` Eric Wong 2008-01-22 1:52 ` Junio C Hamano 2008-01-23 2:43 ` git filter-branch should run git gc --auto Kevin Ballard 2008-01-23 2:46 ` Junio C Hamano 2008-01-23 2:52 ` Junio C Hamano 2008-01-23 3:03 ` Kevin Ballard 2008-01-23 2:54 ` Harvey Harrison 2008-01-23 2:58 ` Kevin Ballard 2008-01-23 5:07 ` Sam Vilain 2008-01-23 8:18 ` Kevin Ballard 2008-01-23 6:44 ` Mike Hommey 2008-01-23 13:00 ` Johannes Schindelin 2008-01-23 19:22 ` Junio C Hamano 2008-02-03 16:55 ` [PATCH 0/2] "git svn" and "git gc --auto" Karl Hasselström 2008-02-03 16:56 ` [PATCH 1/2] git-svn: Don't call git-repack anymore Karl Hasselström 2008-02-03 16:56 ` [PATCH 2/2] Let "git svn" run "git gc --auto" occasionally Karl Hasselström 2008-01-20 21:39 ` [PATCH 1/2] git-svn: Don't call git-repack anymore Karl Hasselström 2008-01-20 21:40 ` [PATCH 2/2] Let "git svn" run "git gc --auto" occasionally Karl Hasselström
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).