git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Alternates and broken repos: A pack and prune scheme to avoid them
@ 2007-11-18 11:25 Johannes Sixt
  2007-11-18 18:39 ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Sixt @ 2007-11-18 11:25 UTC (permalink / raw)
  To: git

As you know, repo.or.cz uses alternates in order to reduce the space that the 
repositories of forked projects require.

Recently, it happened that a fork (4msysgit.git) became broken because it was 
using an object that was pruned away from the repository that it was 
borrowing from (mingw.git). This happened even though 4msysgit did not use 
the branch of mingw.git that was rebased and whose objects were pruned. The 
reason is that a merge in 4msysgit.git resulted in a blob that was also in 
the rebased branch.

To avoid such situations I propose to introduce "attic" packs. They contain 
objects that are unreachable by the local set of refs. Otherwise they are 
used like regular packs.

git-repack produces "attic" packs like this:

- Places objects of the local object store that are unreachable in an "attic" 
pack.
- Copies objects that are reachable but borrowed from an alternate and are 
only in the alternates' "attic" packs into the local regular pack.

git-prune removes "attic" packs.

Then the strategy of garbage collection can be arranged in the following way:

- Repack by starting at the "most complete" repo and work towards the "most 
borrowing" ones. During this phase "attic" packs are created. Borrowing repos 
get a chance to salvage objects before the alternates prune them away.

- Prune by starting at the "most borrowing" repo and work towards the "most 
complete" ones. During this phase the "attic" packs are cleaned up.

What do you think? Is this a way for a solution?

-- Hannes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Alternates and broken repos: A pack and prune scheme to avoid them
  2007-11-18 11:25 [RFC] Alternates and broken repos: A pack and prune scheme to avoid them Johannes Sixt
@ 2007-11-18 18:39 ` Junio C Hamano
  2007-11-18 20:01   ` Johannes Sixt
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2007-11-18 18:39 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

Johannes Sixt <johannes.sixt@telecom.at> writes:

> Then the strategy of garbage collection can be arranged in the following way:
>
> - Repack by starting at the "most complete" repo and work towards the "most 
> borrowing" ones. During this phase "attic" packs are created. Borrowing repos 
> get a chance to salvage objects before the alternates prune them away.
>
> - Prune by starting at the "most borrowing" repo and work towards the "most 
> complete" ones. During this phase the "attic" packs are cleaned up.
>
> What do you think? Is this a way for a solution?

I would imagine that would work as long as it can be controlled
when all the involved repositories are repacked and pruned, such
as on repo.or.cz case (but on the other hand it is not really
controlled well there and that is the reason you wrote the
message X-<).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Alternates and broken repos: A pack and prune scheme to avoid them
  2007-11-18 18:39 ` Junio C Hamano
@ 2007-11-18 20:01   ` Johannes Sixt
  2007-11-18 20:10     ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Sixt @ 2007-11-18 20:01 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

On Sunday 18 November 2007 19:39, Junio C Hamano wrote:
> Johannes Sixt <johannes.sixt@telecom.at> writes:
> > Then the strategy of garbage collection can be arranged in the following
> > way:
> >
> > - Repack by starting at the "most complete" repo and work towards the
> > "most borrowing" ones. During this phase "attic" packs are created.
> > Borrowing repos get a chance to salvage objects before the alternates
> > prune them away.
> >
> > - Prune by starting at the "most borrowing" repo and work towards the
> > "most complete" ones. During this phase the "attic" packs are cleaned up.
> >
> > What do you think? Is this a way for a solution?
>
> I would imagine that would work as long as it can be controlled
> when all the involved repositories are repacked and pruned, such
> as on repo.or.cz case (but on the other hand it is not really
> controlled well there and that is the reason you wrote the
> message X-<).

Well, I think in many situations pack and prune can be controlled. To be 
precise, if alternates are used pack and prune *must* be controlled. 
Currently, the control is very simple: "don't prune" (and I don't recall ATM 
what you must not do when you repack).

Anyway, judging from the responses so far it seems that people can live 
with "don't prune" (or not using alternates) ;-) Repositories getting broken 
this way isn't exactly my itch, either, so... I spelled out a possible 
solution if someone wants to pick up the topic.

-- Hannes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Alternates and broken repos: A pack and prune scheme to avoid them
  2007-11-18 20:01   ` Johannes Sixt
@ 2007-11-18 20:10     ` Junio C Hamano
  2007-11-29  3:41       ` [PATCH/RFC] Teach repack to optionally retain otherwise lost objects Johannes Schindelin
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2007-11-18 20:10 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

Johannes Sixt <johannes.sixt@telecom.at> writes:

> On Sunday 18 November 2007 19:39, Junio C Hamano wrote:
> ...
>> I would imagine that would work as long as it can be controlled
>> when all the involved repositories are repacked and pruned, such
>> as on repo.or.cz case (but on the other hand it is not really
>> controlled well there and that is the reason you wrote the
>> message X-<).
>
> Well, I think in many situations pack and prune can be controlled. To be 
> precise, if alternates are used pack and prune *must* be controlled.
> Currently, the control is very simple: "don't prune" (and I don't recall ATM 
> what you must not do when you repack).
>
> Anyway, judging from the responses so far it seems that people can live 
> with "don't prune" (or not using alternates) ;-)

Because my point was not "don't prune is good enough", I think
you are judging from too small number of responses (in fact,
zero).

My point was that even the existing setup that is well known to
the public (i.e. repo.or.cz) does not seem to be controlled, and
adding a nicer mechanism (e.g. I do not think there currently is
a canned way to prepare a pack that contains only unreachable
objects --- you need to script it anew) for a better control may
not help the situation much, unless it is actually used.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH/RFC] Teach repack to optionally retain otherwise lost objects
  2007-11-18 20:10     ` Junio C Hamano
@ 2007-11-29  3:41       ` Johannes Schindelin
  2007-11-29  6:15         ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Schindelin @ 2007-11-29  3:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, git


When specifying --attic=<prefix>, the objects that would be lost when
calling repack with -d will be put into a packfile (or multiple
packfiles), using the file name prefix <prefix>.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	This implements the idea of Hannes.

	The plan for repo.or.cz is now to invoke repack with
	"--attic=attic" and copied attic-*.{idx,pack} to all the forks'
	object stores, then delete the original attic-*{.idx,pack}.

	The beauty of that approach is that the order in which the
	repositories are repacked is no longer important.

	This patch is marked RFC since there is a severe bottleneck
	here: the new pack's index is sorted and made unique and every
	SHA-1 displayed twice, then the old pack's index is sorted and
	made unique.  Then the combined result is sorted and only the
	now-unique SHA-1s are actually packed.

	(The sort is not necessary if there is only _one_ pack.
	However, we cannot guarantee that.)

	Of course, this is quick 'n dirty, and the price to be paid
	is a substantial performance hit: in my tests, linux-2.6.git
	needed half a second to show its pack's index, but that
	sed 's/^.* //' | sort | uniq | sed p mantra needs 19 seconds.

	The obvious thing is to exploit the fact that the pack indices
	are already sorted:

	I started patch git-show-index so it takes an argument
	--missing-objects, followed by the new pack index file names,
	followed by --, followed be the old pack index file names.

	Then it would traverse all of them simultaneously, outputting
	only the SHA-1s of objects that are in an old pack, but not
	in any of the new packs.

	Two issues: there might be a whole lot of pack files (Pasky
	told me today that in one instance there were 416 pack files!)
	and that might well exceed the maximum number of open files.

	Second issue: there are two different pack index formats, and
	the code is not easily refactored AFAICT.

	Probably a better method would be not to read the files
	simultaneously, but fill a "struct decorate *" with objects
	(which can be faked, as we do not really need to parse them)
	of the new packs, and then only use decorate_lookup() to determine
	for all old packs' objects if they are present in the new ones.

	The latter approach would allow for a relatively easy refactoring
	of show-index.c; just provide a callback for each entry.

	However, I am way too tired today to do it.

	Besides, a completely different idea just struck me: before
	repacking, .git/objects/pack/* could be _hard linked_ to the
	forkee's object stores.  Then nothing in git-repack's code
	needs to be changed.

	Oh, well.  I just wasted 1.5 hours.

 Documentation/git-repack.txt |    4 +++
 git-repack.sh                |   40 +++++++++++++++++++++++++++++++++----
 t/t5303-repack-attic.sh      |   44 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+), 5 deletions(-)
 create mode 100644 t/t5303-repack-attic.sh

diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 12e2079..ec2c2bf 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -84,6 +84,10 @@ OPTIONS
 	If specified,  multiple packfiles may be created.
 	The default is unlimited.
 
+--attic=<prefix>::
+	Put all objects that would/will be lost when running with `-d`
+	into its own packfile(s), with file name prefix `<prefix>`.
+
 
 Configuration
 -------------
diff --git a/git-repack.sh b/git-repack.sh
index e18eb3f..f83d6f0 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -18,12 +18,13 @@ window=         size of the window used for delta compression
 window-memory=  same as the above, but limit memory size instead of entries count
 depth=          limits the maximum delta depth
 max-pack-size=  maximum size of each packfile
+attic=          pack no-longer-packed used objects into an "attic" pack
 "
 SUBDIRECTORY_OK='Yes'
 . git-sh-setup
 
 no_update_info= all_into_one= remove_redundant= keep_unreachable=
-local= quiet= no_reuse= extra=
+local= quiet= no_reuse= extra= attic=
 while test $# != 0
 do
 	case "$1" in
@@ -37,6 +38,8 @@ do
 	-l)	local=--local ;;
 	--max-pack-size|--window|--window-memory|--depth)
 		extra="$extra $1=$2"; shift ;;
+	--attic)
+		attic="$2"; shift ;;
 	--) shift; break;;
 	*)	usage ;;
 	esac
@@ -119,6 +122,36 @@ for name in $names ; do
 	rm -f "$PACKDIR/old-pack-$name.pack" "$PACKDIR/old-pack-$name.idx"
 done
 
+new_existing=
+for e in $existing
+do
+	case "$ fullbases " in
+	*" $e "*) ;;
+	*)
+		new_existing="$new_existing $e"
+		;;
+	esac
+done
+existing="$new_existing"
+
+if test ! -z "$attic"
+then
+	# Find the objects which were in the existing packs, but are no
+	# longer in the new ones.
+	#
+	# This could be much more efficient.
+	(for name in $names
+	 do
+		git show-index < $PACKDIR/pack-$name.idx
+	 done | sed 's/^.* //' | sort | uniq | sed p &&
+	 for e in $existing
+	 do
+		git show-index < "$PACKDIR/$e.idx"
+	 done | sed 's/^.* //' | sort | uniq) | sort | uniq -u |
+	git pack-objects --non-empty "$attic" ||
+		die "Could not create attic '$attic'."
+fi
+
 if test "$remove_redundant" = t
 then
 	# We know $existing are all redundant.
@@ -128,10 +161,7 @@ then
 		( cd "$PACKDIR" &&
 		  for e in $existing
 		  do
-			case " $fullbases " in
-			*" $e "*) ;;
-			*)	rm -f "$e.pack" "$e.idx" "$e.keep" ;;
-			esac
+			rm -f "$e.pack" "$e.idx" "$e.keep"
 		  done
 		)
 	fi
diff --git a/t/t5303-repack-attic.sh b/t/t5303-repack-attic.sh
new file mode 100644
index 0000000..9777748
--- /dev/null
+++ b/t/t5303-repack-attic.sh
@@ -0,0 +1,44 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='repack with an attic pack'
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+
+	echo "Ten weary, footsore wanderers," > file &&
+	git add file &&
+	test_tick &&
+	git commit -m initial &&
+	echo "all in a woeful plight," >> file &&
+	test_tick &&
+	git commit -m second file &&
+	echo "sought shelter in a wayside-inn" >> file &&
+	test_tick &&
+	git commit -m third file &&
+	echo "one dark and lonely night." >> file &&
+	test_tick &&
+	git commit -m fourth file &&
+	echo ">>Nine rooms, no more<<, the landlord said," >> file &&
+	test_tick && git commit -m fifth file &&
+	git repack -a -d &&
+	! ls .git/objects/??/*
+
+'
+
+test_expect_success 'create attic pack' '
+
+	LAST_VERSION=$(git rev-parse --verify HEAD:file) &&
+	git reset --hard HEAD^ &&
+	rm .git/logs/HEAD .git/logs/refs/heads/master &&
+	git cat-file blob $LAST_VERSION &&
+	git repack --attic=attic -a -d &&
+	! git cat-file blob $LAST_VERSION &&
+	test -f attic-*.idx &&
+	cat attic-*.idx | git show-index | grep $LAST_VERSION
+
+'
+
+test_done
-- 
1.5.3.6.2066.g09421

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH/RFC] Teach repack to optionally retain otherwise lost objects
  2007-11-29  3:41       ` [PATCH/RFC] Teach repack to optionally retain otherwise lost objects Johannes Schindelin
@ 2007-11-29  6:15         ` Junio C Hamano
  2007-11-29 11:57           ` Johannes Schindelin
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2007-11-29  6:15 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Johannes Sixt, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> 	Besides, a completely different idea just struck me: before
> 	repacking, .git/objects/pack/* could be _hard linked_ to the
> 	forkee's object stores.  Then nothing in git-repack's code
> 	needs to be changed.
>
> 	Oh, well.  I just wasted 1.5 hours.

Your 1.5 hours was spent wisely to come up with that idea ;-).

To make sure I understand your idea correctly, the procedure to repack a
repository in a fork-friendly way is:

 (1) find the project directly forked from you;

 (2) hardlink all packs under your object store to their object store;

 (3) repack -a -l and prune.

I think that would work as long as you do the above as a unit and handle
one repository at a time.  Otherwise I think you risk losing necessary
objects when hierarchical forks are involved.  E.g.  if you have a
project X that has a fork Y which in turn has fork Z.

	* Step 1 is run for X, Y and Z.
        * Step 2 is run for Y and Z.
        * Step 3 is run for Z.

At this point, Z is still borrowing objects from Y and X through Y, and
it will not keep objects it is borrowing from X through Y.  Then if the
procedure is intermixed like this, a bad thing happens.

	* Step 2 is run for X.
	* Step 3 is run for Y.
	* Step 3 is run for X.

Step 3 for Y would lose objects Y was borrowing from X that were not
used by Y itself.  At this point, Z is still usable as the objects it is
borrowing from X though Y have not been pruned from X.  But Step 3 for X
will lose them, rendering Z unusable.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH/RFC] Teach repack to optionally retain otherwise lost objects
  2007-11-29  6:15         ` Junio C Hamano
@ 2007-11-29 11:57           ` Johannes Schindelin
  2007-11-29 14:21             ` [PATCH] Add "--expire <time>" option to 'git prune' Johannes Schindelin
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Schindelin @ 2007-11-29 11:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, git

Hi,

On Wed, 28 Nov 2007, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > 	Besides, a completely different idea just struck me: before
> > 	repacking, .git/objects/pack/* could be _hard linked_ to the
> > 	forkee's object stores.  Then nothing in git-repack's code
> > 	needs to be changed.
> >
> > 	Oh, well.  I just wasted 1.5 hours.
> 
> Your 1.5 hours was spent wisely to come up with that idea ;-).

Thanks ;-)

> To make sure I understand your idea correctly, the procedure to repack a 
> repository in a fork-friendly way is:
> 
>  (1) find the project directly forked from you;
> 
>  (2) hardlink all packs under your object store to their object store;
> 
>  (3) repack -a -l and prune.

Yep.

> I think that would work as long as you do the above as a unit and handle
> one repository at a time.

Exactly.  See

http://repo.or.cz/w/repo.git?a=commitdiff;h=fba501deabd349afbe3b8bf89f385889889e04ac

for a tired proposal.

Note that "prune" is not (yet) an option, since it could possibly destroy 
objects which are needed in an ongoing push operation.

However, we could do exactly the same as with reflogs: introduce a grace 
period (with loose objects, we can use the ctime...)

> Otherwise I think you risk losing necessary objects when hierarchical 
> forks are involved.  E.g.  if you have a project X that has a fork Y 
> which in turn has fork Z.

Well, in theory you could also iterate over all projects and hard link the 
packs/objects of their alternates, and _then_ iterate and repack.  But it 
is simpler and more obvious in the case of repo.or.cz to do all in one 
iteration, because we can order the repository names easily so that 
forkees come first, _and_ we have an easy way to find out what are the 
forks of a project.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Add "--expire <time>" option to 'git prune'
  2007-11-29 11:57           ` Johannes Schindelin
@ 2007-11-29 14:21             ` Johannes Schindelin
  2007-11-29 14:35               ` Johannes Sixt
                                 ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Johannes Schindelin @ 2007-11-29 14:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, git, pasky


Earlier, 'git prune' would prune all loose unreachable objects.
This could be quite dangerous, as the objects could be used in
an ongoing operation.

This patch adds a mode to expire only loose, unreachable objects
which are older than a certain time.  For example, by

	git prune --expire 14.days

you can prune only those objects which are loose, unreachable
and older than 14 days (and thus probably outdated).

The implementation uses st.st_mtime rather than st.st_ctime,
because it can be tested better, using 'touch -d <time>' (and
omitting the test when the platform does not support that
command line switch).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	On Thu, 29 Nov 2007, Johannes Schindelin wrote:

	> Note that "prune" is not (yet) an option [for repo.or.cz], since 
	> it could possibly destroy objects which are needed in an ongoing 
	> push operation.
	> 
	> However, we could do exactly the same as with reflogs: introduce 
	> a grace period (with loose objects, we can use the ctime...)

	and this patch does that (except using mtime as ctime, for reasons 
	explained in the commit message.

	Obviously, this patch is asking for a cousin, changing
	git-gc to use this option, and maybe introduce a config
	variable gc.pruneAge.

 Documentation/git-prune.txt |    5 ++++-
 builtin-prune.c             |   21 ++++++++++++++++++++-
 t/t1410-reflog.sh           |   18 ++++++++++++++++++
 3 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
index 0ace233..9835bdb 100644
--- a/Documentation/git-prune.txt
+++ b/Documentation/git-prune.txt
@@ -8,7 +8,7 @@ git-prune - Prune all unreachable objects from the object database
 
 SYNOPSIS
 --------
-'git-prune' [-n] [--] [<head>...]
+'git-prune' [-n] [--expire <expire>] [--] [<head>...]
 
 DESCRIPTION
 -----------
@@ -31,6 +31,9 @@ OPTIONS
 \--::
 	Do not interpret any more arguments as options.
 
+\--expire <time>::
+	Only expire loose objects older than <time>.
+
 <head>...::
 	In addition to objects
 	reachable from any of our references, keep objects
diff --git a/builtin-prune.c b/builtin-prune.c
index 44df59e..b5e7684 100644
--- a/builtin-prune.c
+++ b/builtin-prune.c
@@ -7,15 +7,24 @@
 
 static const char prune_usage[] = "git-prune [-n]";
 static int show_only;
+static unsigned long expire;
 
 static int prune_object(char *path, const char *filename, const unsigned char *sha1)
 {
+	const char *fullpath = mkpath("%s/%s", path, filename);
+	if (expire) {
+		struct stat st;
+		if (lstat(fullpath, &st))
+			return error("Could not stat '%s'", fullpath);
+		if (st.st_mtime > expire)
+			return 0;
+	}
 	if (show_only) {
 		enum object_type type = sha1_object_info(sha1, NULL);
 		printf("%s %s\n", sha1_to_hex(sha1),
 		       (type > 0) ? typename(type) : "unknown");
 	} else
-		unlink(mkpath("%s/%s", path, filename));
+		unlink(fullpath);
 	return 0;
 }
 
@@ -85,6 +94,16 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 			show_only = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--expire")) {
+			if (++i < argc) {
+				expire = approxidate(argv[i]);
+				continue;
+			}
+		}
+		else if (!prefixcmp(arg, "--expire=")) {
+			expire = approxidate(arg + 9);
+			continue;
+		}
 		usage(prune_usage);
 	}
 
diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
index 12a53ed..f093802 100755
--- a/t/t1410-reflog.sh
+++ b/t/t1410-reflog.sh
@@ -201,4 +201,22 @@ test_expect_success 'delete' '
 	! grep dragon < output
 '
 
+test_expect_success 'prune --expire' '
+
+	BLOB=$(echo aleph | git hash-object -w --stdin) &&
+	BLOB_FILE=.git/objects/$(echo $BLOB | sed "s/^../&\//") &&
+	test 20 = $(git count-objects | sed "s/ .*//") &&
+	test -f $BLOB_FILE &&
+	git reset --hard &&
+	if touch -d "Jan 1 1970" $BLOB_FILE
+	then
+		git prune --expire 1.day &&
+		test 19 = $(git count-objects | sed "s/ .*//") &&
+		! test -f $BLOB_FILE
+	else
+		say "Skipping test due to non-working touch -d"
+	fi
+
+'
+
 test_done
-- 
1.5.3.6.2087.g788ea4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add "--expire <time>" option to 'git prune'
  2007-11-29 14:21             ` [PATCH] Add "--expire <time>" option to 'git prune' Johannes Schindelin
@ 2007-11-29 14:35               ` Johannes Sixt
  2007-11-29 15:22                 ` [PATCH v2] " Johannes Schindelin
  2007-11-29 15:12               ` [PATCH] " Jeff King
  2007-11-29 20:06               ` Junio C Hamano
  2 siblings, 1 reply; 14+ messages in thread
From: Johannes Sixt @ 2007-11-29 14:35 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git, pasky

Johannes Schindelin schrieb:
> +test_expect_success 'prune --expire' '
> +
> +	BLOB=$(echo aleph | git hash-object -w --stdin) &&
> +	BLOB_FILE=.git/objects/$(echo $BLOB | sed "s/^../&\//") &&
> +	test 20 = $(git count-objects | sed "s/ .*//") &&
> +	test -f $BLOB_FILE &&
> +	git reset --hard &&

Here you could throw in:

	git prune --expire=1.hour.ago &&
	test 20 = $(git count-objects | sed "s/ .*//") &&
	test -f $BLOB_FILE &&

to test that the object is not pruned (and the alternate --expire syntax).

> +	if touch -d "Jan 1 1970" $BLOB_FILE
> +	then
> +		git prune --expire 1.day &&
> +		test 19 = $(git count-objects | sed "s/ .*//") &&
> +		! test -f $BLOB_FILE
> +	else
> +		say "Skipping test due to non-working touch -d"
> +	fi

-- Hannes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add "--expire <time>" option to 'git prune'
  2007-11-29 14:21             ` [PATCH] Add "--expire <time>" option to 'git prune' Johannes Schindelin
  2007-11-29 14:35               ` Johannes Sixt
@ 2007-11-29 15:12               ` Jeff King
  2007-11-29 16:13                 ` Johannes Schindelin
  2007-11-29 20:06               ` Junio C Hamano
  2 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2007-11-29 15:12 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Johannes Sixt, git, pasky

On Thu, Nov 29, 2007 at 02:21:23PM +0000, Johannes Schindelin wrote:

> This patch adds a mode to expire only loose, unreachable objects
> which are older than a certain time.  For example, by
> 
> 	git prune --expire 14.days
> 
> you can prune only those objects which are loose, unreachable
> and older than 14 days (and thus probably outdated).

Does this now make git-prune safe for automatic running?

I suppose you could still be actively manipulating refs that point to
very old objects.

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2] Add "--expire <time>" option to 'git prune'
  2007-11-29 14:35               ` Johannes Sixt
@ 2007-11-29 15:22                 ` Johannes Schindelin
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Schindelin @ 2007-11-29 15:22 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git, pasky


Earlier, 'git prune' would prune all loose unreachable objects.
This could be quite dangerous, as the objects could be used in
an ongoing operation.

This patch adds a mode to expire only loose, unreachable objects
which are older than a certain time.  For example, by

	git prune --expire 14.days

you can prune only those objects which are loose, unreachable
and older than 14 days (and thus probably outdated).

The implementation uses st.st_mtime rather than st.st_ctime,
because it can be tested better, using 'touch -d <time>' (and
omitting the test when the platform does not support that
command line switch).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	On Thu, 29 Nov 2007, Johannes Sixt wrote:

	> Johannes Schindelin schrieb:
	> > +test_expect_success 'prune --expire' '
	> > +
	> > +	BLOB=$(echo aleph | git hash-object -w --stdin) &&
	> > +	BLOB_FILE=.git/objects/$(echo $BLOB | sed "s/^../&\//") &&
	> > +	test 20 = $(git count-objects | sed "s/ .*//") &&
	> > +	test -f $BLOB_FILE &&
	> > +	git reset --hard &&
	> 
	> Here you could throw in:
	> 
	> 	git prune --expire=1.hour.ago &&
	> 	test 20 = $(git count-objects | sed "s/ .*//") &&
	> 	test -f $BLOB_FILE &&
	> 
	> to test that the object is not pruned (and the alternate 
	> --expire syntax).

	Good idea!

 Documentation/git-prune.txt |    5 ++++-
 builtin-prune.c             |   21 ++++++++++++++++++++-
 t/t1410-reflog.sh           |   21 +++++++++++++++++++++
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
index 0ace233..9835bdb 100644
--- a/Documentation/git-prune.txt
+++ b/Documentation/git-prune.txt
@@ -8,7 +8,7 @@ git-prune - Prune all unreachable objects from the object database
 
 SYNOPSIS
 --------
-'git-prune' [-n] [--] [<head>...]
+'git-prune' [-n] [--expire <expire>] [--] [<head>...]
 
 DESCRIPTION
 -----------
@@ -31,6 +31,9 @@ OPTIONS
 \--::
 	Do not interpret any more arguments as options.
 
+\--expire <time>::
+	Only expire loose objects older than <time>.
+
 <head>...::
 	In addition to objects
 	reachable from any of our references, keep objects
diff --git a/builtin-prune.c b/builtin-prune.c
index 44df59e..b5e7684 100644
--- a/builtin-prune.c
+++ b/builtin-prune.c
@@ -7,15 +7,24 @@
 
 static const char prune_usage[] = "git-prune [-n]";
 static int show_only;
+static unsigned long expire;
 
 static int prune_object(char *path, const char *filename, const unsigned char *sha1)
 {
+	const char *fullpath = mkpath("%s/%s", path, filename);
+	if (expire) {
+		struct stat st;
+		if (lstat(fullpath, &st))
+			return error("Could not stat '%s'", fullpath);
+		if (st.st_mtime > expire)
+			return 0;
+	}
 	if (show_only) {
 		enum object_type type = sha1_object_info(sha1, NULL);
 		printf("%s %s\n", sha1_to_hex(sha1),
 		       (type > 0) ? typename(type) : "unknown");
 	} else
-		unlink(mkpath("%s/%s", path, filename));
+		unlink(fullpath);
 	return 0;
 }
 
@@ -85,6 +94,16 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 			show_only = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--expire")) {
+			if (++i < argc) {
+				expire = approxidate(argv[i]);
+				continue;
+			}
+		}
+		else if (!prefixcmp(arg, "--expire=")) {
+			expire = approxidate(arg + 9);
+			continue;
+		}
 		usage(prune_usage);
 	}
 
diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
index 12a53ed..3924dc4 100755
--- a/t/t1410-reflog.sh
+++ b/t/t1410-reflog.sh
@@ -201,4 +201,25 @@ test_expect_success 'delete' '
 	! grep dragon < output
 '
 
+test_expect_success 'prune --expire' '
+
+	BLOB=$(echo aleph | git hash-object -w --stdin) &&
+	BLOB_FILE=.git/objects/$(echo $BLOB | sed "s/^../&\//") &&
+	test 20 = $(git count-objects | sed "s/ .*//") &&
+	test -f $BLOB_FILE &&
+	git reset --hard &&
+	git prune --expire=1.hour.ago &&
+	test 20 = $(git count-objects | sed "s/ .*//") &&
+	test -f $BLOB_FILE &&
+	if touch -d "Jan 1 1970" $BLOB_FILE
+	then
+		git prune --expire 1.day &&
+		test 19 = $(git count-objects | sed "s/ .*//") &&
+		! test -f $BLOB_FILE
+	else
+		say "Skipping test due to non-working touch -d"
+	fi
+
+'
+
 test_done
-- 
1.5.3.6.2088.g8c260

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add "--expire <time>" option to 'git prune'
  2007-11-29 15:12               ` [PATCH] " Jeff King
@ 2007-11-29 16:13                 ` Johannes Schindelin
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Schindelin @ 2007-11-29 16:13 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Johannes Sixt, git, pasky

Hi,

On Thu, 29 Nov 2007, Jeff King wrote:

> On Thu, Nov 29, 2007 at 02:21:23PM +0000, Johannes Schindelin wrote:
> 
> > This patch adds a mode to expire only loose, unreachable objects
> > which are older than a certain time.  For example, by
> > 
> > 	git prune --expire 14.days
> > 
> > you can prune only those objects which are loose, unreachable
> > and older than 14 days (and thus probably outdated).
> 
> Does this now make git-prune safe for automatic running?
> 
> I suppose you could still be actively manipulating refs that point to
> very old objects.

That's why I want to have it configurable from git-gc.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add "--expire <time>" option to 'git prune'
  2007-11-29 14:21             ` [PATCH] Add "--expire <time>" option to 'git prune' Johannes Schindelin
  2007-11-29 14:35               ` Johannes Sixt
  2007-11-29 15:12               ` [PATCH] " Jeff King
@ 2007-11-29 20:06               ` Junio C Hamano
  2007-11-29 20:59                 ` Johannes Schindelin
  2 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2007-11-29 20:06 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Johannes Sixt, git, pasky

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> The implementation uses st.st_mtime rather than st.st_ctime,
> because it can be tested better, using 'touch -d <time>' (and
> omitting the test when the platform does not support that
> command line switch).

But I think you can use more portable -t for setting mtime to
1970/01/01, but I had a feeling that earlier we were bitten by
non-portability of "touch" and introduced test-chmtime.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Add "--expire <time>" option to 'git prune'
  2007-11-29 20:06               ` Junio C Hamano
@ 2007-11-29 20:59                 ` Johannes Schindelin
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Schindelin @ 2007-11-29 20:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, git, pasky


Earlier, 'git prune' would prune all loose unreachable objects.
This could be quite dangerous, as the objects could be used in
an ongoing operation.

This patch adds a mode to expire only loose, unreachable objects
which are older than a certain time.  For example, by

	git prune --expire 14.days

you can prune only those objects which are loose, unreachable
and older than 14 days (and thus probably outdated).

The implementation uses st.st_mtime rather than st.st_ctime,
because it can be tested better, using 'touch -d <time>' (and
omitting the test when the platform does not support that
command line switch).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	On Thu, 29 Nov 2007, Junio C Hamano wrote:

	> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
	> 
	> > The implementation uses st.st_mtime rather than st.st_ctime,
	> > because it can be tested better, using 'touch -d <time>' (and
	> > omitting the test when the platform does not support that
	> > command line switch).
	> 
	> But I think you can use more portable -t for setting mtime to
	> 1970/01/01, but I had a feeling that earlier we were bitten by
	> non-portability of "touch" and introduced test-chmtime.

	Somehow that slipped by me.  This patch uses test-chmtime.

 Documentation/git-prune.txt |    5 ++++-
 builtin-prune.c             |   21 ++++++++++++++++++++-
 t/t1410-reflog.sh           |   17 +++++++++++++++++
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
index 0ace233..9835bdb 100644
--- a/Documentation/git-prune.txt
+++ b/Documentation/git-prune.txt
@@ -8,7 +8,7 @@ git-prune - Prune all unreachable objects from the object database
 
 SYNOPSIS
 --------
-'git-prune' [-n] [--] [<head>...]
+'git-prune' [-n] [--expire <expire>] [--] [<head>...]
 
 DESCRIPTION
 -----------
@@ -31,6 +31,9 @@ OPTIONS
 \--::
 	Do not interpret any more arguments as options.
 
+\--expire <time>::
+	Only expire loose objects older than <time>.
+
 <head>...::
 	In addition to objects
 	reachable from any of our references, keep objects
diff --git a/builtin-prune.c b/builtin-prune.c
index 44df59e..b5e7684 100644
--- a/builtin-prune.c
+++ b/builtin-prune.c
@@ -7,15 +7,24 @@
 
 static const char prune_usage[] = "git-prune [-n]";
 static int show_only;
+static unsigned long expire;
 
 static int prune_object(char *path, const char *filename, const unsigned char *sha1)
 {
+	const char *fullpath = mkpath("%s/%s", path, filename);
+	if (expire) {
+		struct stat st;
+		if (lstat(fullpath, &st))
+			return error("Could not stat '%s'", fullpath);
+		if (st.st_mtime > expire)
+			return 0;
+	}
 	if (show_only) {
 		enum object_type type = sha1_object_info(sha1, NULL);
 		printf("%s %s\n", sha1_to_hex(sha1),
 		       (type > 0) ? typename(type) : "unknown");
 	} else
-		unlink(mkpath("%s/%s", path, filename));
+		unlink(fullpath);
 	return 0;
 }
 
@@ -85,6 +94,16 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 			show_only = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--expire")) {
+			if (++i < argc) {
+				expire = approxidate(argv[i]);
+				continue;
+			}
+		}
+		else if (!prefixcmp(arg, "--expire=")) {
+			expire = approxidate(arg + 9);
+			continue;
+		}
 		usage(prune_usage);
 	}
 
diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
index 12a53ed..4a17573 100755
--- a/t/t1410-reflog.sh
+++ b/t/t1410-reflog.sh
@@ -201,4 +201,21 @@ test_expect_success 'delete' '
 	! grep dragon < output
 '
 
+test_expect_success 'prune --expire' '
+
+	BLOB=$(echo aleph | git hash-object -w --stdin) &&
+	BLOB_FILE=.git/objects/$(echo $BLOB | sed "s/^../&\//") &&
+	test 20 = $(git count-objects | sed "s/ .*//") &&
+	test -f $BLOB_FILE &&
+	git reset --hard &&
+	git prune --expire=1.hour.ago &&
+	test 20 = $(git count-objects | sed "s/ .*//") &&
+	test -f $BLOB_FILE &&
+	test-chmtime -86400 $BLOB_FILE &&
+	git prune --expire 1.day &&
+	test 19 = $(git count-objects | sed "s/ .*//") &&
+	! test -f $BLOB_FILE
+
+'
+
 test_done
-- 
1.5.3.6.2088.g8c260

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-11-29 21:01 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-18 11:25 [RFC] Alternates and broken repos: A pack and prune scheme to avoid them Johannes Sixt
2007-11-18 18:39 ` Junio C Hamano
2007-11-18 20:01   ` Johannes Sixt
2007-11-18 20:10     ` Junio C Hamano
2007-11-29  3:41       ` [PATCH/RFC] Teach repack to optionally retain otherwise lost objects Johannes Schindelin
2007-11-29  6:15         ` Junio C Hamano
2007-11-29 11:57           ` Johannes Schindelin
2007-11-29 14:21             ` [PATCH] Add "--expire <time>" option to 'git prune' Johannes Schindelin
2007-11-29 14:35               ` Johannes Sixt
2007-11-29 15:22                 ` [PATCH v2] " Johannes Schindelin
2007-11-29 15:12               ` [PATCH] " Jeff King
2007-11-29 16:13                 ` Johannes Schindelin
2007-11-29 20:06               ` Junio C Hamano
2007-11-29 20:59                 ` Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).