git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] git-repack-script: Add option to repack all objects
@ 2005-08-27  8:41 Frank Sorenson
  2005-08-28 21:06 ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Frank Sorenson @ 2005-08-27  8:41 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This patch adds an option to git-repack-script to repack all objects,
including both packed and unpacked.  This allows a full repack of
a git archive (current cogito packs from 39MB to 4.5MB, and git packs
from 4.4MB to 3.8MB).

Signed-off-by: Frank Sorenson <frank@tuxrocks.com>

diff --git a/git-repack-script b/git-repack-script
- --- a/git-repack-script
+++ b/git-repack-script
@@ -5,10 +5,12 @@
 
 . git-sh-setup-script || die "Not a git archive"
 	
+repack_all=
 no_update_info=
 while case "$#" in 0) break ;; esac
 do
 	case "$1" in
+	--all)	repack_all=t ;;
 	-n)	no_update_info=t ;;
 	*)	break ;;
 	esac
@@ -16,13 +18,22 @@ do
 done
 
 rm -f .tmp-pack-*
- -packname=$(git-rev-list --unpacked --objects $(git-rev-parse --all) |
- -	git-pack-objects --non-empty --incremental .tmp-pack) ||
- -	exit 1
- -if [ -z "$packname" ]; then
- -	echo Nothing new to pack
- -	exit 0
- -fi
+case "$repack_all" in
+t)	packname=$(git-rev-list --objects $(git-rev-parse --all) |
+		git-pack-objects .tmp-pack) ||
+		exit 1
+	find "$GIT_OBJECT_DIRECTORY/"?? -type f | xargs rm -f
+	find "$GIT_OBJECT_DIRECTORY/pack" -type f | xargs rm -f
+	;;
+*)	packname=$(git-rev-list --unpacked --objects $(git-rev-parse --all) |
+		git-pack-objects --non-empty --incremental .tmp-pack) ||
+		exit 1
+	if [ -z "$packname" ]; then
+		echo Nothing new to pack
+		exit 0
+	fi
+	;;
+esac
 
 mkdir -p "$GIT_OBJECT_DIRECTORY/pack" &&
 mv .tmp-pack-$packname.pack "$GIT_OBJECT_DIRECTORY/pack/pack-$packname.pack" &&


Frank
- -- 
Frank Sorenson - KD7TZK
Systems Manager, Computer Science Department
Brigham Young University
frank@tuxrocks.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDECcnaI0dwg4A47wRAsigAKDEItbKTKAeO+PO8VV0dtMvFl0qfgCffyDc
hL0nAUB0HxeDlDoh9fv2m4o=
=r4gM
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] git-repack-script: Add option to repack all objects
  2005-08-27  8:41 [PATCH] git-repack-script: Add option to repack all objects Frank Sorenson
@ 2005-08-28 21:06 ` Junio C Hamano
  2005-08-29  7:41   ` Frank Sorenson
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2005-08-28 21:06 UTC (permalink / raw)
  To: Frank Sorenson; +Cc: Git Mailing List

Frank Sorenson <frank@tuxrocks.com> writes:

> This patch adds an option to git-repack-script to repack all objects,
> including both packed and unpacked.  This allows a full repack of
> a git archive (current cogito packs from 39MB to 4.5MB, and git packs
> from 4.4MB to 3.8MB).
>
> Signed-off-by: Frank Sorenson <frank@tuxrocks.com>

While I agree that giving more flexibility to repack objects is
a good idea, I am not sure rolling all existing objects into one
pack and removing the existing one is a good way to go.

I'd do this slightly differently.  I do not think removing
existing pack belongs to this command.  We would probably want a
separate tool to find extra/redundant packs and remove them, or
more generally optimize packs by selectively exploding them and
repacking them ("pack optimizer").

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] git-repack-script: Add option to repack all objects
  2005-08-28 21:06 ` Junio C Hamano
@ 2005-08-29  7:41   ` Frank Sorenson
  2005-08-29 16:48     ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Frank Sorenson @ 2005-08-29  7:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Junio C Hamano wrote:
> Frank Sorenson <frank@tuxrocks.com> writes:
> 
>>This patch adds an option to git-repack-script to repack all objects,
>>including both packed and unpacked.  This allows a full repack of
>>a git archive (current cogito packs from 39MB to 4.5MB, and git packs
>>from 4.4MB to 3.8MB).
>>
>>Signed-off-by: Frank Sorenson <frank@tuxrocks.com>
> 
> 
> While I agree that giving more flexibility to repack objects is
> a good idea, I am not sure rolling all existing objects into one
> pack and removing the existing one is a good way to go.

It reduces the disk space requirement significantly (linux packs from
135MB to 73MB), and I'm seeing speed improvements as well (probably
because cache-cold operation requires far less seeking, and the caching
requirements are smaller).

What are the benefits to keeping old packs?

> I'd do this slightly differently.  I do not think removing
> existing pack belongs to this command.  We would probably want a
> separate tool to find extra/redundant packs and remove them, or
> more generally optimize packs by selectively exploding them and
> repacking them ("pack optimizer").

I disagree about not removing old packs.  When you "repack" your
suitcase, you take everything out and put it back in again, so a command
named "repack" should remove all existing objects, and put them back again.

Okay, so the pack algorithm could be better, but that only means that
repacking the entire set of objects would improve things more, making
some sort of "git-repack-all" an even more valuable operation.

Frank
- --
Frank Sorenson - KD7TZK
Systems Manager, Computer Science Department
Brigham Young University
frank@tuxrocks.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDErwnaI0dwg4A47wRAkVGAKDqDjQ5IBTO+DC/nKpYl+69w7RESgCg6omQ
xwbQqnXJnfxITC1TAjRtLSk=
=tCyP
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] git-repack-script: Add option to repack all objects
  2005-08-29  7:41   ` Frank Sorenson
@ 2005-08-29 16:48     ` Junio C Hamano
  2005-08-29 17:34       ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2005-08-29 16:48 UTC (permalink / raw)
  To: Frank Sorenson; +Cc: Git Mailing List

Frank Sorenson <frank@tuxrocks.com> writes:

> It reduces the disk space requirement significantly (linux packs from
> 135MB to 73MB), and I'm seeing speed improvements as well (probably
> because cache-cold operation requires far less seeking, and the caching
> requirements are smaller).
>
> What are the benefits to keeping old packs?

For a private repository where one does development in and does
push to public repositories from, packing everything into one
pack and pruning everything else (including old packs) is always
the optimum thing, if one can afford the time to repack.

There are no benefits to _keeping_ old packs, but there may be
benefits not to pack everything into one huge one when other
people are involved.

Suppose I have currently three packs (one since the beginning of
time to some time ago, one incremental on top of it, another
incremental on top of the other two).  Somebody cloned from my
repository reasonably early in the project timeline (he has only
the first pack), somebody else cloned yesterday (has all three
packs).  And "git count-objects" reports many other objects are
unpacked and I decide it is a time to repack.

At this point I could create everything into one new big pack
and remove old packs.  Or I could create the fourth incremental.
Another possibility, and which is what I currently do by hand,
is to create a pack that is incremental on top of the first two,
and replace the latest incremental with it.

Now these two people want to fetch from my repository while the
third person wants to clone from scratch.  Which repacking
strategy gives the best transfer to these three people?  Having
a single huge pack favors the newcomer and penalizes the old
timers.  Especially, the current http-pull does not have a smart
to pick a better pack when an object is found in more than one
packs, so leaving old packs around would not help.

Leaving the old packs around could help all of them.  In the
above example, I could create the fourth incremental _and_ a
superpack that has everything in it.  The newcomer would slurp
in the superpack, the one with only the first pack can use one
of the second+third+fourth or the superpack, and the one with
all three can use the fourth pack.

Having said that, the packing has an interesting compression
characteristics.  Repacking the three existing packs (from the
example) along with the unpacked objects into one pack would
result in a very small pack, compared to the sum of three
existing packs, depending on how often you repack.  In that
sense, it may not be such a big deal to force everybody to
re-fetch everything even if most of them are already locally
available, by repacking everything into one.

> I disagree about not removing old packs.

I am not saying we should not remove old pack.  I am saying that
repacking, choosing which pack to remove and doing the actual
removing should be kept as separate steps and in separate
commands, perhaps the latter two as part of "git prune".

-jc

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] git-repack-script: Add option to repack all objects.
  2005-08-29 16:48     ` Junio C Hamano
@ 2005-08-29 17:34       ` Junio C Hamano
  2005-08-29 18:29         ` A Large Angry SCM
  2005-08-29 18:59         ` Frank Sorenson
  0 siblings, 2 replies; 10+ messages in thread
From: Junio C Hamano @ 2005-08-29 17:34 UTC (permalink / raw)
  To: Frank Sorenson; +Cc: Git Mailing List

This originally came from Frank Sorenson but with a bit of
rework to allow future enhancement to the command without
changing the external interface for removal part.

With the '-a' option, all objects in the current repository are
packed into a single pack.  When the '-d' option is given at the
same time, existing packs that were made redundant by this round
of repacking are deleted.

Since we currently have only two repacking strategies, one '-a'
(everything into one) and the other not '-a' (incrementally pack
only the unpacked ones), '-d' is meaningful only used with '-a'
and removes all the existing packs before repacking for now.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

    Junio C Hamano <junkio@cox.net> writes:
    > I am not saying we should not remove old pack.  I am saying that
    > repacking, choosing which pack to remove and doing the actual
    > removing should be kept as separate steps and in separate
    > commands, perhaps the latter two as part of "git prune".

    Frank, this is what I meant by the above.  When we have pack
    redundancy detection and removal in "git prune", probably we
    would call it when '-d' is given instead of rolling our own
    here.  That is, "git repack [-a] -d" would be just a
    shorthand to say "repack" without '-d' immediately followed
    by "git prune --redundant-packs".

    Pack optimization idea itself might turn out to be not worth
    it, in which case this version would suffice.  I don't know..

 git-repack-script |   51 +++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 43 insertions(+), 8 deletions(-)

0c3d34bce4c44640a606e47c6346c400bc353604
diff --git a/git-repack-script b/git-repack-script
--- a/git-repack-script
+++ b/git-repack-script
@@ -5,28 +5,63 @@
 
 . git-sh-setup-script || die "Not a git archive"
 	
-no_update_info=
+no_update_info= all_into_one= remove_redundant=
 while case "$#" in 0) break ;; esac
 do
 	case "$1" in
 	-n)	no_update_info=t ;;
+	-a)	all_into_one=t ;;
+	-d)	remove_redandant=t ;;
 	*)	break ;;
 	esac
 	shift
 done
 
 rm -f .tmp-pack-*
-packname=$(git-rev-list --unpacked --objects $(git-rev-parse --all) |
-	git-pack-objects --non-empty --incremental .tmp-pack) ||
+PACKDIR="$GIT_OBJECT_DIRECTORY/pack"
+
+# There will be more repacking strategies to come...
+case ",$all_into_one," in
+,,)
+	rev_list='--unpacked'
+	rev_parse='--all'
+	pack_objects='--incremental'
+	;;
+,t,)
+	rev_list=
+	rev_parse='--all'
+	pack_objects=
+	# This part is a stop-gap until we have proper pack redundancy
+	# checker.
+	existing=`cd "$PACKDIR" && \
+	    find . -type f \( -name '*.pack' -o -name '*.idx' \) -print`
+	;;
+esac
+name=$(git-rev-list --objects $rev_list $(git-rev-parse $rev_parse) |
+	git-pack-objects --non-empty $pack_objects .tmp-pack) ||
 	exit 1
-if [ -z "$packname" ]; then
-	echo Nothing new to pack
+if [ -z "$name" ]; then
+	echo Nothing new to pack.
 	exit 0
 fi
+echo "Pack pack-$name created."
+
+mkdir -p "$PACKDIR" || exit
+
+mv .tmp-pack-$name.pack "$PACKDIR/pack-$name.pack" &&
+mv .tmp-pack-$name.idx  "$PACKDIR/pack-$name.idx" ||
+exit
+
+if test "$remove_redandant" = t
+then
+	# We know $existing are all redandant only when
+	# all-into-one is used.
+	if test "$all_into_one" != '' && test "$existing" != ''
+	then
+		( cd "$PACKDIR" && rm -f $existing )
+	fi
+fi
 
-mkdir -p "$GIT_OBJECT_DIRECTORY/pack" &&
-mv .tmp-pack-$packname.pack "$GIT_OBJECT_DIRECTORY/pack/pack-$packname.pack" &&
-mv .tmp-pack-$packname.idx  "$GIT_OBJECT_DIRECTORY/pack/pack-$packname.idx" &&
 case "$no_update_info" in
 t) : ;;
 *) git-update-server-info ;;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] git-repack-script: Add option to repack all objects.
  2005-08-29 17:34       ` Junio C Hamano
@ 2005-08-29 18:29         ` A Large Angry SCM
  2005-08-29 18:44           ` Junio C Hamano
  2005-08-29 18:59         ` Frank Sorenson
  1 sibling, 1 reply; 10+ messages in thread
From: A Large Angry SCM @ 2005-08-29 18:29 UTC (permalink / raw)
  To: Frank Sorenson; +Cc: Git Mailing List

Junio C Hamano wrote:
> This originally came from Frank Sorenson but with a bit of
> rework to allow future enhancement to the command without
> changing the external interface for removal part.
> 
> With the '-a' option, all objects in the current repository are
> packed into a single pack.  When the '-d' option is given at the
> same time, existing packs that were made redundant by this round
> of repacking are deleted.
> 
> Since we currently have only two repacking strategies, one '-a'
> (everything into one) and the other not '-a' (incrementally pack
> only the unpacked ones), '-d' is meaningful only used with '-a'
> and removes all the existing packs before repacking for now.
> 
[Rest of updated patch snipped]

Frank,

Can you produce a patch to update the git-repack-script documentation to 
reflect the new functionality?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] git-repack-script: Add option to repack all objects.
  2005-08-29 18:29         ` A Large Angry SCM
@ 2005-08-29 18:44           ` Junio C Hamano
  2005-08-29 18:57             ` A Large Angry SCM
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2005-08-29 18:44 UTC (permalink / raw)
  To: gitzilla; +Cc: Frank Sorenson, git

A Large Angry SCM <gitzilla@gmail.com> writes:

> Frank,
>
> Can you produce a patch to update the git-repack-script documentation to 
> reflect the new functionality?

Not including the doc changes in the patch was my fault, but the
message was meant primarily as an explanation of what I meant,
not for immediate inclusion in the master branch.

I have some other documentation updates sitting in the proposed
updates, so I'd do it myself along with other manual pages if
you and Frank do not mind.

In any case, I first would like to make sure that the proposed
patch you are replying to is something Frank agrees to.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] git-repack-script: Add option to repack all objects.
  2005-08-29 18:44           ` Junio C Hamano
@ 2005-08-29 18:57             ` A Large Angry SCM
  2005-08-29 19:44               ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: A Large Angry SCM @ 2005-08-29 18:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Frank Sorenson, git

Junio C Hamano wrote:
> A Large Angry SCM <gitzilla@gmail.com> writes:
> 
>>Frank,
>>
>>Can you produce a patch to update the git-repack-script documentation to 
>>reflect the new functionality?
> 
> Not including the doc changes in the patch was my fault, but the
> message was meant primarily as an explanation of what I meant,
> not for immediate inclusion in the master branch.
> 
> I have some other documentation updates sitting in the proposed
> updates, so I'd do it myself along with other manual pages if
> you and Frank do not mind.
> 
> In any case, I first would like to make sure that the proposed
> patch you are replying to is something Frank agrees to.
> 

I sent my request to Frank because he was/is the sponsor of the change 
but anyone can provide the documentation. :-)

I think it'd be a good idea for documentation updates to accompany all 
patches (and for the maintainer to not be shy about asking for them).

Just my $0.02 as I look at which commands have no documentation.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] git-repack-script: Add option to repack all objects.
  2005-08-29 17:34       ` Junio C Hamano
  2005-08-29 18:29         ` A Large Angry SCM
@ 2005-08-29 18:59         ` Frank Sorenson
  1 sibling, 0 replies; 10+ messages in thread
From: Frank Sorenson @ 2005-08-29 18:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Junio C Hamano wrote:
> This originally came from Frank Sorenson but with a bit of
> rework to allow future enhancement to the command without
> changing the external interface for removal part.
> 
> With the '-a' option, all objects in the current repository are
> packed into a single pack.  When the '-d' option is given at the
> same time, existing packs that were made redundant by this round
> of repacking are deleted.
> 
> Since we currently have only two repacking strategies, one '-a'
> (everything into one) and the other not '-a' (incrementally pack
> only the unpacked ones), '-d' is meaningful only used with '-a'
> and removes all the existing packs before repacking for now.

Thank you for explaining the reasoning, and reworking the patch.  This
does make more sense, and I can see the logic for leaving around the
packs.  Coming from the perspective of the end user, I would probably
want to repack quite a bit more often to take advantage of the size and
speed advantages, while large public repositories will probably want to
repack at much longer periods.  Thanks for seeing both perspectives.  I
like your updated patch.

Frank
- --
Frank Sorenson - KD7TZK
Systems Manager, Computer Science Department
Brigham Young University
frank@tuxrocks.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDE1r3aI0dwg4A47wRAldNAJ9J7wmyQMsMm5G0FgvOggc+QDtg/QCg0T+w
y6A/46LYEr1zhFgxK6uKX0I=
=z8uM
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] git-repack-script: Add option to repack all objects.
  2005-08-29 18:57             ` A Large Angry SCM
@ 2005-08-29 19:44               ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2005-08-29 19:44 UTC (permalink / raw)
  To: gitzilla; +Cc: Frank Sorenson, git

A Large Angry SCM <gitzilla@gmail.com> writes:

> ... (and for the maintainer to not be shy about asking for
> them).

Point taken.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-08-29 19:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-27  8:41 [PATCH] git-repack-script: Add option to repack all objects Frank Sorenson
2005-08-28 21:06 ` Junio C Hamano
2005-08-29  7:41   ` Frank Sorenson
2005-08-29 16:48     ` Junio C Hamano
2005-08-29 17:34       ` Junio C Hamano
2005-08-29 18:29         ` A Large Angry SCM
2005-08-29 18:44           ` Junio C Hamano
2005-08-29 18:57             ` A Large Angry SCM
2005-08-29 19:44               ` Junio C Hamano
2005-08-29 18:59         ` Frank Sorenson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).