git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] repack: allow simultaneous packing and pruning
@ 2006-10-10 10:14 Sam Vilain
  2006-10-10 11:04 ` Sam Vilain
  2006-10-10 15:03 ` Linus Torvalds
  0 siblings, 2 replies; 9+ messages in thread
From: Sam Vilain @ 2006-10-10 10:14 UTC (permalink / raw)
  To: git

If using git-repack -a, unreferenced objects are kept behind in the
pack.  This might be the best default, but there are no good ways
to clean up the packfiles if a lot of rebasing is happening, or
branches have been deleted.
---
see also http://colabti.de/irclogger/irclogger_log/git?date=2006-10-10,Tue&sel=27#l75

 Documentation/git-repack.txt |    7 ++++++-
 git-repack.sh                |   14 +++++++++++++-
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 9516227..63ee7cb 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -9,7 +9,7 @@ objects into pack files.
 
 SYNOPSIS
 --------
-'git-repack' [-a] [-d] [-f] [-l] [-n] [-q]
+'git-repack' [-a] [-d] [-f] [-l] [-n] [-q] [-p]
 
 DESCRIPTION
 -----------
@@ -40,6 +40,11 @@ OPTIONS
 	existing packs redundant, remove the redundant packs.
 	Also runs gitlink:git-prune-packed[1].
 
+-p::
+	Before packing, remove any unreferenced objects with
+	gitlink:git-prune[1].  When used with '-a', unreferenced
+	objects in the old packs are not taken across.
+
 -l::
         Pass the `--local` option to `git pack-objects`, see
         gitlink:git-pack-objects[1].
diff --git a/git-repack.sh b/git-repack.sh
index 640ad8d..a2ad955 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -7,13 +7,14 @@ USAGE='[-a] [-d] [-f] [-l] [-n] [-q]'
 . git-sh-setup
 
 no_update_info= all_into_one= remove_redundant=
-local= quiet= no_reuse_delta= extra=
+local= quiet= no_reuse_delta= extra= prune=
 while case "$#" in 0) break ;; esac
 do
 	case "$1" in
 	-n)	no_update_info=t ;;
 	-a)	all_into_one=t ;;
 	-d)	remove_redundant=t ;;
+	-p)     prune=t ;;
 	-q)	quiet=-q ;;
 	-f)	no_reuse_delta=--no-reuse-delta ;;
 	-l)	local=--local ;;
@@ -32,6 +33,11 @@ case ",$all_into_one," in
 ,,)
 	rev_list='--unpacked'
 	pack_objects='--incremental'
+	if [ -n "$prune" ]
+	then
+	    # prune junk first
+	    git-prune
+	fi
 	;;
 ,t,)
 	rev_list=
@@ -40,8 +46,14 @@ case ",$all_into_one," in
 	# Redundancy check in all-into-one case is trivial.
 	existing=`cd "$PACKDIR" && \
 	    find . -type f \( -name '*.pack' -o -name '*.idx' \) -print`
+
+	if [ -n "$prune" ]
+	then
+	    rev_list=`cd "$GIT_DIR" && find refs -type f -print`
+	fi
 	;;
 esac
+
 pack_objects="$pack_objects $local $quiet $no_reuse_delta$extra"
 name=$(git-rev-list --objects --all $rev_list 2>&1 |
 	git-pack-objects --non-empty $pack_objects .tmp-pack) ||
-- 
1.4.2.g0ea2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] repack: allow simultaneous packing and pruning
  2006-10-10 10:14 [PATCH] repack: allow simultaneous packing and pruning Sam Vilain
@ 2006-10-10 11:04 ` Sam Vilain
  2006-10-10 15:03 ` Linus Torvalds
  1 sibling, 0 replies; 9+ messages in thread
From: Sam Vilain @ 2006-10-10 11:04 UTC (permalink / raw)
  To: git

er, I guess I assumed that the critique about the behaviour was true
rather than checking it myself... this patch is mostly a null-op.  *and*
it's not whitespace clean!

Sam.

Sam Vilain wrote:
> If using git-repack -a, unreferenced objects are kept behind in the
> pack.  This might be the best default, but there are no good ways
> to clean up the packfiles if a lot of rebasing is happening, or
> branches have been deleted.
> ---
> see also http://colabti.de/irclogger/irclogger_log/git?date=2006-10-10,Tue&sel=27#l75
> 
>  Documentation/git-repack.txt |    7 ++++++-
>  git-repack.sh                |   14 +++++++++++++-
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
> index 9516227..63ee7cb 100644
> --- a/Documentation/git-repack.txt
> +++ b/Documentation/git-repack.txt
> @@ -9,7 +9,7 @@ objects into pack files.
>  
>  SYNOPSIS
>  --------
> -'git-repack' [-a] [-d] [-f] [-l] [-n] [-q]
> +'git-repack' [-a] [-d] [-f] [-l] [-n] [-q] [-p]
>  
>  DESCRIPTION
>  -----------
> @@ -40,6 +40,11 @@ OPTIONS
>  	existing packs redundant, remove the redundant packs.
>  	Also runs gitlink:git-prune-packed[1].
>  
> +-p::
> +	Before packing, remove any unreferenced objects with
> +	gitlink:git-prune[1].  When used with '-a', unreferenced
> +	objects in the old packs are not taken across.
> +
>  -l::
>          Pass the `--local` option to `git pack-objects`, see
>          gitlink:git-pack-objects[1].
> diff --git a/git-repack.sh b/git-repack.sh
> index 640ad8d..a2ad955 100755
> --- a/git-repack.sh
> +++ b/git-repack.sh
> @@ -7,13 +7,14 @@ USAGE='[-a] [-d] [-f] [-l] [-n] [-q]'
>  . git-sh-setup
>  
>  no_update_info= all_into_one= remove_redundant=
> -local= quiet= no_reuse_delta= extra=
> +local= quiet= no_reuse_delta= extra= prune=
>  while case "$#" in 0) break ;; esac
>  do
>  	case "$1" in
>  	-n)	no_update_info=t ;;
>  	-a)	all_into_one=t ;;
>  	-d)	remove_redundant=t ;;
> +	-p)     prune=t ;;
>  	-q)	quiet=-q ;;
>  	-f)	no_reuse_delta=--no-reuse-delta ;;
>  	-l)	local=--local ;;
> @@ -32,6 +33,11 @@ case ",$all_into_one," in
>  ,,)
>  	rev_list='--unpacked'
>  	pack_objects='--incremental'
> +	if [ -n "$prune" ]
> +	then
> +	    # prune junk first
> +	    git-prune
> +	fi
>  	;;
>  ,t,)
>  	rev_list=
> @@ -40,8 +46,14 @@ case ",$all_into_one," in
>  	# Redundancy check in all-into-one case is trivial.
>  	existing=`cd "$PACKDIR" && \
>  	    find . -type f \( -name '*.pack' -o -name '*.idx' \) -print`
> +
> +	if [ -n "$prune" ]
> +	then
> +	    rev_list=`cd "$GIT_DIR" && find refs -type f -print`
> +	fi
>  	;;
>  esac
> +
>  pack_objects="$pack_objects $local $quiet $no_reuse_delta$extra"
>  name=$(git-rev-list --objects --all $rev_list 2>&1 |
>  	git-pack-objects --non-empty $pack_objects .tmp-pack) ||

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] repack: allow simultaneous packing and pruning
  2006-10-10 10:14 [PATCH] repack: allow simultaneous packing and pruning Sam Vilain
  2006-10-10 11:04 ` Sam Vilain
@ 2006-10-10 15:03 ` Linus Torvalds
  2006-10-10 19:46   ` Eran Tromer
  2006-10-10 20:24   ` Junio C Hamano
  1 sibling, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2006-10-10 15:03 UTC (permalink / raw)
  To: Sam Vilain; +Cc: git



On Tue, 10 Oct 2006, Sam Vilain wrote:
>
> If using git-repack -a, unreferenced objects are kept behind in the
> pack.  This might be the best default, but there are no good ways
> to clean up the packfiles if a lot of rebasing is happening, or
> branches have been deleted.

Don't do this.

I understand why you want to do it, but the fact is, it's dangerous.

Right now, "git repack" is actually safe to run even on a repository which 
is being modified! And that's actually important, if you have something 
like a shared repo that gets re-packed every once in a while from a 
cron-job!

So the refs might be up-dated as it runs, and if that happens, your 
pruning doesn't really do the right thing - it might consider a new loose 
object to be unreachable, because it didn't check whether the refs have 
changed since it read them so that it might actually _be_ reachable after 
all.

So please don't do this. 

It's important for operations to always think about "what happens if 
somebody does a 'commit' or pushes into the tree at the same time?".

For example, the "git prune-packed" that gets run afterwards is _not_ 
racy, because it will only prune objects that already exist in the pack.

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] repack: allow simultaneous packing and pruning
  2006-10-10 15:03 ` Linus Torvalds
@ 2006-10-10 19:46   ` Eran Tromer
  2006-10-10 21:25     ` Linus Torvalds
  2006-10-10 20:24   ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Eran Tromer @ 2006-10-10 19:46 UTC (permalink / raw)
  To: Linus Torvalds, git

On 2006-10-10 17:03, Linus Torvalds wrote:
> On Tue, 10 Oct 2006, Sam Vilain wrote:
>> If using git-repack -a, unreferenced objects are kept behind in the
>> pack.  This might be the best default, but there are no good ways
>> to clean up the packfiles if a lot of rebasing is happening, or
>> branches have been deleted.
> 
> Don't do this.

Too late: "git repack -a -d" already does it, in contradiction to its
manpage. It creates a new pack by following .git/refs, and then deletes
all old pack files.

> I understand why you want to do it, but the fact is, it's dangerous.
> 
> Right now, "git repack" is actually safe to run even on a repository which 
> is being modified! And that's actually important, if you have something 
> like a shared repo that gets re-packed every once in a while from a 
> cron-job!

Don't run it on a shared repo, then. And grab a coffee while it runs.
But why force leaf repositories to accumulate garbage?

This functionality is just as racy, and just as necessary, as
"git-prune". It merely garbage-collects the packs as well. Git seems to
collect unreferenced objects faster than the space between the cushions
in my sofa, and there ought to be a way to tidy up things.

Linus, I see why you neither need nor want this functionality in your
typical workflow, but things look different for a downstream developer
who engages in a variety of garbage-generating activities like tracking
wild trees, rebasing patches and using stgit. I really don't need that
unreferenced copy of 2.6.15-rc2-mm1 in my packs anymore.

  Eran

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] repack: allow simultaneous packing and pruning
  2006-10-10 15:03 ` Linus Torvalds
  2006-10-10 19:46   ` Eran Tromer
@ 2006-10-10 20:24   ` Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2006-10-10 20:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Sam Vilain, git

Linus Torvalds <torvalds@osdl.org> writes:

> On Tue, 10 Oct 2006, Sam Vilain wrote:
>>
>> If using git-repack -a, unreferenced objects are kept behind in the
>> pack.  This might be the best default, but there are no good ways
>> to clean up the packfiles if a lot of rebasing is happening, or
>> branches have been deleted.
>
> Don't do this.
>
> I understand why you want to do it, but the fact is, it's dangerous.

Sorry, I understand "it's dangerous" part, but I do not
understand "why you want to do it" part.

@@ -32,6 +33,11 @@ case ",$all_into_one," in
 ,,)
 	rev_list='--unpacked'
 	pack_objects='--incremental'
+	if [ -n "$prune" ]
+	then
+	    # prune junk first
+	    git-prune
+	fi
 	;;
 ,t,)
 	rev_list=

This shouldn't make any difference if the repository is
quiescent (and is dangerous if it isn't).  pack-objects will
not get fed things that are not reachable.

@@ -40,8 +46,14 @@ case ",$all_into_one," in
 	# Redundancy check in all-into-one case is trivial.
 	existing=`cd "$PACKDIR" && \
 	    find . -type f \( -name '*.pack' -o -name '*.idx' \) -print`
+
+	if [ -n "$prune" ]
+	then
+	    rev_list=`cd "$GIT_DIR" && find refs -type f -print`
+	fi
 	;;
 esac
+

We give --all to rev-list so this should not have any effect
either; other than that the code introduced by this hunk is
broken with packed-refs.

Isn't "repack -a -d" what Sam wants?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] repack: allow simultaneous packing and pruning
  2006-10-10 19:46   ` Eran Tromer
@ 2006-10-10 21:25     ` Linus Torvalds
  2006-10-10 22:09       ` Eran Tromer
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2006-10-10 21:25 UTC (permalink / raw)
  To: Eran Tromer; +Cc: git



On Tue, 10 Oct 2006, Eran Tromer wrote:
> 
> Too late: "git repack -a -d" already does it, in contradiction to its 
> manpage. It creates a new pack by following .git/refs, and then deletes 
> all old pack files.

That's very different.

That just means that you should not try to do two _concurrent_ repacks. 

> Don't run it on a shared repo, then. And grab a coffee while it runs.
> But why force leaf repositories to accumulate garbage?

Nobody forces that.

You can run "git prune" if you want to. But at least we know that "git 
prune" is unsafe.

			Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] repack: allow simultaneous packing and pruning
  2006-10-10 21:25     ` Linus Torvalds
@ 2006-10-10 22:09       ` Eran Tromer
  2006-10-10 22:27         ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Eran Tromer @ 2006-10-10 22:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On 2006-10-10 23:25, Linus Torvalds wrote:
> On Tue, 10 Oct 2006, Eran Tromer wrote:
>> Too late: "git repack -a -d" already does it, in contradiction to its 
>> manpage. It creates a new pack by following .git/refs, and then deletes 
>> all old pack files.
> 
> That's very different.
> 
> That just means that you should not try to do two _concurrent_ repacks. 

How so? This process loses the unreferenced objects from the old packs,
where "referenced" is determined in a racy way. Same problem.

> 
>> Don't run it on a shared repo, then. And grab a coffee while it runs.
>> But why force leaf repositories to accumulate garbage?
> 
> Nobody forces that.
> 
> You can run "git prune" if you want to. But at least we know that "git 
> prune" is unsafe.

But "git prune" does not GC packs, only loose objects.

Anyway, I think the right thing to do is to make "git repack -a -d"
operate safely (not drop any objects), and add a new --prune option
so that "git repack -a -d --prune" does what "git repack -a -d" used to do.

  Eran

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] repack: allow simultaneous packing and pruning
  2006-10-10 22:09       ` Eran Tromer
@ 2006-10-10 22:27         ` Linus Torvalds
  2006-10-10 23:45           ` Eran Tromer
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2006-10-10 22:27 UTC (permalink / raw)
  To: Eran Tromer; +Cc: git



On Wed, 11 Oct 2006, Eran Tromer wrote:
> 
> How so? This process loses the unreferenced objects from the old packs,
> where "referenced" is determined in a racy way. Same problem.

No.

Those unreferenced objects are old history that won't be part of any new 
history.

If you create new history, they won't be in the pack.

It's obviously possible that you create new history that has a blob that 
is equal to some old history (and no loose object will be created), but by 
then we're _really_ reaching. 

> But "git prune" does not GC packs, only loose objects.

Right. And you'd want to repack _and_ prune, but they should be kept 
separate, because one is safe, the other is not.

Of course, if the code were to check that no references have changed over 
the operation, then I wouldn't have any objections.

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] repack: allow simultaneous packing and pruning
  2006-10-10 22:27         ` Linus Torvalds
@ 2006-10-10 23:45           ` Eran Tromer
  0 siblings, 0 replies; 9+ messages in thread
From: Eran Tromer @ 2006-10-10 23:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On 2006-10-11 00:27, Linus Torvalds wrote:
> Those unreferenced objects are old history that won't be part of any new 
> history.
> 
> If you create new history, they won't be in the pack.

... because git-repack moves only already-referenced objects to packs
(and once they're referenced a subsequent "git-repack -a -d" won't lose
them). Curiously, this critically depends on Documentation/git-repack
being wrong:

  This script is used to combine all objects that do not currently
  reside in a "pack", into a pack.

However, this means there is no safe way to create a new pack without
adding all its content as loose objects first.

For example, the following is racy because there's a point where the new
pack is on disk but not yet referenced:

$ git-fetch --keep foo &  git-repack -a -d


>> But "git prune" does not GC packs, only loose objects.
> 
> Right. And you'd want to repack _and_ prune, but they should be kept 
> separate, because one is safe, the other is not.

Ah, semantics.

The request was for removing unreferenced objects ("pruning") in *packs*
while doing the repacking. This turns out to be already implemented
(contrary to the docs) and, as you explained, safe.

Pruning both packed and loose objects while repacking is neither safe
nor requested (and is indeed roughly equivalent to just
"git-repack -a -d; git-prune").


  Eran

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-10-10 23:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-10 10:14 [PATCH] repack: allow simultaneous packing and pruning Sam Vilain
2006-10-10 11:04 ` Sam Vilain
2006-10-10 15:03 ` Linus Torvalds
2006-10-10 19:46   ` Eran Tromer
2006-10-10 21:25     ` Linus Torvalds
2006-10-10 22:09       ` Eran Tromer
2006-10-10 22:27         ` Linus Torvalds
2006-10-10 23:45           ` Eran Tromer
2006-10-10 20:24   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).