* [PATCH] git-repack-script: Add option to repack all objects @ 2005-08-27 8:41 Frank Sorenson 2005-08-28 21:06 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Frank Sorenson @ 2005-08-27 8:41 UTC (permalink / raw) To: Git Mailing List, Junio C Hamano -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This patch adds an option to git-repack-script to repack all objects, including both packed and unpacked. This allows a full repack of a git archive (current cogito packs from 39MB to 4.5MB, and git packs from 4.4MB to 3.8MB). Signed-off-by: Frank Sorenson <frank@tuxrocks.com> diff --git a/git-repack-script b/git-repack-script - --- a/git-repack-script +++ b/git-repack-script @@ -5,10 +5,12 @@ . git-sh-setup-script || die "Not a git archive" +repack_all= no_update_info= while case "$#" in 0) break ;; esac do case "$1" in + --all) repack_all=t ;; -n) no_update_info=t ;; *) break ;; esac @@ -16,13 +18,22 @@ do done rm -f .tmp-pack-* - -packname=$(git-rev-list --unpacked --objects $(git-rev-parse --all) | - - git-pack-objects --non-empty --incremental .tmp-pack) || - - exit 1 - -if [ -z "$packname" ]; then - - echo Nothing new to pack - - exit 0 - -fi +case "$repack_all" in +t) packname=$(git-rev-list --objects $(git-rev-parse --all) | + git-pack-objects .tmp-pack) || + exit 1 + find "$GIT_OBJECT_DIRECTORY/"?? -type f | xargs rm -f + find "$GIT_OBJECT_DIRECTORY/pack" -type f | xargs rm -f + ;; +*) packname=$(git-rev-list --unpacked --objects $(git-rev-parse --all) | + git-pack-objects --non-empty --incremental .tmp-pack) || + exit 1 + if [ -z "$packname" ]; then + echo Nothing new to pack + exit 0 + fi + ;; +esac mkdir -p "$GIT_OBJECT_DIRECTORY/pack" && mv .tmp-pack-$packname.pack "$GIT_OBJECT_DIRECTORY/pack/pack-$packname.pack" && Frank - -- Frank Sorenson - KD7TZK Systems Manager, Computer Science Department Brigham Young University frank@tuxrocks.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDECcnaI0dwg4A47wRAsigAKDEItbKTKAeO+PO8VV0dtMvFl0qfgCffyDc hL0nAUB0HxeDlDoh9fv2m4o= =r4gM -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] git-repack-script: Add option to repack all objects 2005-08-27 8:41 [PATCH] git-repack-script: Add option to repack all objects Frank Sorenson @ 2005-08-28 21:06 ` Junio C Hamano 2005-08-29 7:41 ` Frank Sorenson 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2005-08-28 21:06 UTC (permalink / raw) To: Frank Sorenson; +Cc: Git Mailing List Frank Sorenson <frank@tuxrocks.com> writes: > This patch adds an option to git-repack-script to repack all objects, > including both packed and unpacked. This allows a full repack of > a git archive (current cogito packs from 39MB to 4.5MB, and git packs > from 4.4MB to 3.8MB). > > Signed-off-by: Frank Sorenson <frank@tuxrocks.com> While I agree that giving more flexibility to repack objects is a good idea, I am not sure rolling all existing objects into one pack and removing the existing one is a good way to go. I'd do this slightly differently. I do not think removing existing pack belongs to this command. We would probably want a separate tool to find extra/redundant packs and remove them, or more generally optimize packs by selectively exploding them and repacking them ("pack optimizer"). ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] git-repack-script: Add option to repack all objects 2005-08-28 21:06 ` Junio C Hamano @ 2005-08-29 7:41 ` Frank Sorenson 2005-08-29 16:48 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Frank Sorenson @ 2005-08-29 7:41 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Junio C Hamano wrote: > Frank Sorenson <frank@tuxrocks.com> writes: > >>This patch adds an option to git-repack-script to repack all objects, >>including both packed and unpacked. This allows a full repack of >>a git archive (current cogito packs from 39MB to 4.5MB, and git packs >>from 4.4MB to 3.8MB). >> >>Signed-off-by: Frank Sorenson <frank@tuxrocks.com> > > > While I agree that giving more flexibility to repack objects is > a good idea, I am not sure rolling all existing objects into one > pack and removing the existing one is a good way to go. It reduces the disk space requirement significantly (linux packs from 135MB to 73MB), and I'm seeing speed improvements as well (probably because cache-cold operation requires far less seeking, and the caching requirements are smaller). What are the benefits to keeping old packs? > I'd do this slightly differently. I do not think removing > existing pack belongs to this command. We would probably want a > separate tool to find extra/redundant packs and remove them, or > more generally optimize packs by selectively exploding them and > repacking them ("pack optimizer"). I disagree about not removing old packs. When you "repack" your suitcase, you take everything out and put it back in again, so a command named "repack" should remove all existing objects, and put them back again. Okay, so the pack algorithm could be better, but that only means that repacking the entire set of objects would improve things more, making some sort of "git-repack-all" an even more valuable operation. Frank - -- Frank Sorenson - KD7TZK Systems Manager, Computer Science Department Brigham Young University frank@tuxrocks.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDErwnaI0dwg4A47wRAkVGAKDqDjQ5IBTO+DC/nKpYl+69w7RESgCg6omQ xwbQqnXJnfxITC1TAjRtLSk= =tCyP -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] git-repack-script: Add option to repack all objects 2005-08-29 7:41 ` Frank Sorenson @ 2005-08-29 16:48 ` Junio C Hamano 2005-08-29 17:34 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2005-08-29 16:48 UTC (permalink / raw) To: Frank Sorenson; +Cc: Git Mailing List Frank Sorenson <frank@tuxrocks.com> writes: > It reduces the disk space requirement significantly (linux packs from > 135MB to 73MB), and I'm seeing speed improvements as well (probably > because cache-cold operation requires far less seeking, and the caching > requirements are smaller). > > What are the benefits to keeping old packs? For a private repository where one does development in and does push to public repositories from, packing everything into one pack and pruning everything else (including old packs) is always the optimum thing, if one can afford the time to repack. There are no benefits to _keeping_ old packs, but there may be benefits not to pack everything into one huge one when other people are involved. Suppose I have currently three packs (one since the beginning of time to some time ago, one incremental on top of it, another incremental on top of the other two). Somebody cloned from my repository reasonably early in the project timeline (he has only the first pack), somebody else cloned yesterday (has all three packs). And "git count-objects" reports many other objects are unpacked and I decide it is a time to repack. At this point I could create everything into one new big pack and remove old packs. Or I could create the fourth incremental. Another possibility, and which is what I currently do by hand, is to create a pack that is incremental on top of the first two, and replace the latest incremental with it. Now these two people want to fetch from my repository while the third person wants to clone from scratch. Which repacking strategy gives the best transfer to these three people? Having a single huge pack favors the newcomer and penalizes the old timers. Especially, the current http-pull does not have a smart to pick a better pack when an object is found in more than one packs, so leaving old packs around would not help. Leaving the old packs around could help all of them. In the above example, I could create the fourth incremental _and_ a superpack that has everything in it. The newcomer would slurp in the superpack, the one with only the first pack can use one of the second+third+fourth or the superpack, and the one with all three can use the fourth pack. Having said that, the packing has an interesting compression characteristics. Repacking the three existing packs (from the example) along with the unpacked objects into one pack would result in a very small pack, compared to the sum of three existing packs, depending on how often you repack. In that sense, it may not be such a big deal to force everybody to re-fetch everything even if most of them are already locally available, by repacking everything into one. > I disagree about not removing old packs. I am not saying we should not remove old pack. I am saying that repacking, choosing which pack to remove and doing the actual removing should be kept as separate steps and in separate commands, perhaps the latter two as part of "git prune". -jc ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] git-repack-script: Add option to repack all objects. 2005-08-29 16:48 ` Junio C Hamano @ 2005-08-29 17:34 ` Junio C Hamano 2005-08-29 18:29 ` A Large Angry SCM 2005-08-29 18:59 ` Frank Sorenson 0 siblings, 2 replies; 10+ messages in thread From: Junio C Hamano @ 2005-08-29 17:34 UTC (permalink / raw) To: Frank Sorenson; +Cc: Git Mailing List This originally came from Frank Sorenson but with a bit of rework to allow future enhancement to the command without changing the external interface for removal part. With the '-a' option, all objects in the current repository are packed into a single pack. When the '-d' option is given at the same time, existing packs that were made redundant by this round of repacking are deleted. Since we currently have only two repacking strategies, one '-a' (everything into one) and the other not '-a' (incrementally pack only the unpacked ones), '-d' is meaningful only used with '-a' and removes all the existing packs before repacking for now. Signed-off-by: Junio C Hamano <junkio@cox.net> --- Junio C Hamano <junkio@cox.net> writes: > I am not saying we should not remove old pack. I am saying that > repacking, choosing which pack to remove and doing the actual > removing should be kept as separate steps and in separate > commands, perhaps the latter two as part of "git prune". Frank, this is what I meant by the above. When we have pack redundancy detection and removal in "git prune", probably we would call it when '-d' is given instead of rolling our own here. That is, "git repack [-a] -d" would be just a shorthand to say "repack" without '-d' immediately followed by "git prune --redundant-packs". Pack optimization idea itself might turn out to be not worth it, in which case this version would suffice. I don't know.. git-repack-script | 51 +++++++++++++++++++++++++++++++++++++++++++-------- 1 files changed, 43 insertions(+), 8 deletions(-) 0c3d34bce4c44640a606e47c6346c400bc353604 diff --git a/git-repack-script b/git-repack-script --- a/git-repack-script +++ b/git-repack-script @@ -5,28 +5,63 @@ . git-sh-setup-script || die "Not a git archive" -no_update_info= +no_update_info= all_into_one= remove_redundant= while case "$#" in 0) break ;; esac do case "$1" in -n) no_update_info=t ;; + -a) all_into_one=t ;; + -d) remove_redandant=t ;; *) break ;; esac shift done rm -f .tmp-pack-* -packname=$(git-rev-list --unpacked --objects $(git-rev-parse --all) | - git-pack-objects --non-empty --incremental .tmp-pack) || +PACKDIR="$GIT_OBJECT_DIRECTORY/pack" + +# There will be more repacking strategies to come... +case ",$all_into_one," in +,,) + rev_list='--unpacked' + rev_parse='--all' + pack_objects='--incremental' + ;; +,t,) + rev_list= + rev_parse='--all' + pack_objects= + # This part is a stop-gap until we have proper pack redundancy + # checker. + existing=`cd "$PACKDIR" && \ + find . -type f \( -name '*.pack' -o -name '*.idx' \) -print` + ;; +esac +name=$(git-rev-list --objects $rev_list $(git-rev-parse $rev_parse) | + git-pack-objects --non-empty $pack_objects .tmp-pack) || exit 1 -if [ -z "$packname" ]; then - echo Nothing new to pack +if [ -z "$name" ]; then + echo Nothing new to pack. exit 0 fi +echo "Pack pack-$name created." + +mkdir -p "$PACKDIR" || exit + +mv .tmp-pack-$name.pack "$PACKDIR/pack-$name.pack" && +mv .tmp-pack-$name.idx "$PACKDIR/pack-$name.idx" || +exit + +if test "$remove_redandant" = t +then + # We know $existing are all redandant only when + # all-into-one is used. + if test "$all_into_one" != '' && test "$existing" != '' + then + ( cd "$PACKDIR" && rm -f $existing ) + fi +fi -mkdir -p "$GIT_OBJECT_DIRECTORY/pack" && -mv .tmp-pack-$packname.pack "$GIT_OBJECT_DIRECTORY/pack/pack-$packname.pack" && -mv .tmp-pack-$packname.idx "$GIT_OBJECT_DIRECTORY/pack/pack-$packname.idx" && case "$no_update_info" in t) : ;; *) git-update-server-info ;; ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] git-repack-script: Add option to repack all objects. 2005-08-29 17:34 ` Junio C Hamano @ 2005-08-29 18:29 ` A Large Angry SCM 2005-08-29 18:44 ` Junio C Hamano 2005-08-29 18:59 ` Frank Sorenson 1 sibling, 1 reply; 10+ messages in thread From: A Large Angry SCM @ 2005-08-29 18:29 UTC (permalink / raw) To: Frank Sorenson; +Cc: Git Mailing List Junio C Hamano wrote: > This originally came from Frank Sorenson but with a bit of > rework to allow future enhancement to the command without > changing the external interface for removal part. > > With the '-a' option, all objects in the current repository are > packed into a single pack. When the '-d' option is given at the > same time, existing packs that were made redundant by this round > of repacking are deleted. > > Since we currently have only two repacking strategies, one '-a' > (everything into one) and the other not '-a' (incrementally pack > only the unpacked ones), '-d' is meaningful only used with '-a' > and removes all the existing packs before repacking for now. > [Rest of updated patch snipped] Frank, Can you produce a patch to update the git-repack-script documentation to reflect the new functionality? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] git-repack-script: Add option to repack all objects. 2005-08-29 18:29 ` A Large Angry SCM @ 2005-08-29 18:44 ` Junio C Hamano 2005-08-29 18:57 ` A Large Angry SCM 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2005-08-29 18:44 UTC (permalink / raw) To: gitzilla; +Cc: Frank Sorenson, git A Large Angry SCM <gitzilla@gmail.com> writes: > Frank, > > Can you produce a patch to update the git-repack-script documentation to > reflect the new functionality? Not including the doc changes in the patch was my fault, but the message was meant primarily as an explanation of what I meant, not for immediate inclusion in the master branch. I have some other documentation updates sitting in the proposed updates, so I'd do it myself along with other manual pages if you and Frank do not mind. In any case, I first would like to make sure that the proposed patch you are replying to is something Frank agrees to. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] git-repack-script: Add option to repack all objects. 2005-08-29 18:44 ` Junio C Hamano @ 2005-08-29 18:57 ` A Large Angry SCM 2005-08-29 19:44 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: A Large Angry SCM @ 2005-08-29 18:57 UTC (permalink / raw) To: Junio C Hamano; +Cc: Frank Sorenson, git Junio C Hamano wrote: > A Large Angry SCM <gitzilla@gmail.com> writes: > >>Frank, >> >>Can you produce a patch to update the git-repack-script documentation to >>reflect the new functionality? > > Not including the doc changes in the patch was my fault, but the > message was meant primarily as an explanation of what I meant, > not for immediate inclusion in the master branch. > > I have some other documentation updates sitting in the proposed > updates, so I'd do it myself along with other manual pages if > you and Frank do not mind. > > In any case, I first would like to make sure that the proposed > patch you are replying to is something Frank agrees to. > I sent my request to Frank because he was/is the sponsor of the change but anyone can provide the documentation. :-) I think it'd be a good idea for documentation updates to accompany all patches (and for the maintainer to not be shy about asking for them). Just my $0.02 as I look at which commands have no documentation. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] git-repack-script: Add option to repack all objects. 2005-08-29 18:57 ` A Large Angry SCM @ 2005-08-29 19:44 ` Junio C Hamano 0 siblings, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2005-08-29 19:44 UTC (permalink / raw) To: gitzilla; +Cc: Frank Sorenson, git A Large Angry SCM <gitzilla@gmail.com> writes: > ... (and for the maintainer to not be shy about asking for > them). Point taken. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] git-repack-script: Add option to repack all objects. 2005-08-29 17:34 ` Junio C Hamano 2005-08-29 18:29 ` A Large Angry SCM @ 2005-08-29 18:59 ` Frank Sorenson 1 sibling, 0 replies; 10+ messages in thread From: Frank Sorenson @ 2005-08-29 18:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Junio C Hamano wrote: > This originally came from Frank Sorenson but with a bit of > rework to allow future enhancement to the command without > changing the external interface for removal part. > > With the '-a' option, all objects in the current repository are > packed into a single pack. When the '-d' option is given at the > same time, existing packs that were made redundant by this round > of repacking are deleted. > > Since we currently have only two repacking strategies, one '-a' > (everything into one) and the other not '-a' (incrementally pack > only the unpacked ones), '-d' is meaningful only used with '-a' > and removes all the existing packs before repacking for now. Thank you for explaining the reasoning, and reworking the patch. This does make more sense, and I can see the logic for leaving around the packs. Coming from the perspective of the end user, I would probably want to repack quite a bit more often to take advantage of the size and speed advantages, while large public repositories will probably want to repack at much longer periods. Thanks for seeing both perspectives. I like your updated patch. Frank - -- Frank Sorenson - KD7TZK Systems Manager, Computer Science Department Brigham Young University frank@tuxrocks.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDE1r3aI0dwg4A47wRAldNAJ9J7wmyQMsMm5G0FgvOggc+QDtg/QCg0T+w y6A/46LYEr1zhFgxK6uKX0I= =z8uM -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-08-29 19:44 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-08-27 8:41 [PATCH] git-repack-script: Add option to repack all objects Frank Sorenson 2005-08-28 21:06 ` Junio C Hamano 2005-08-29 7:41 ` Frank Sorenson 2005-08-29 16:48 ` Junio C Hamano 2005-08-29 17:34 ` Junio C Hamano 2005-08-29 18:29 ` A Large Angry SCM 2005-08-29 18:44 ` Junio C Hamano 2005-08-29 18:57 ` A Large Angry SCM 2005-08-29 19:44 ` Junio C Hamano 2005-08-29 18:59 ` Frank Sorenson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).