* [RFC] Alternates and broken repos: A pack and prune scheme to avoid them
@ 2007-11-18 11:25 Johannes Sixt
2007-11-18 18:39 ` Junio C Hamano
0 siblings, 1 reply; 14+ messages in thread
From: Johannes Sixt @ 2007-11-18 11:25 UTC (permalink / raw)
To: git
As you know, repo.or.cz uses alternates in order to reduce the space that the
repositories of forked projects require.
Recently, it happened that a fork (4msysgit.git) became broken because it was
using an object that was pruned away from the repository that it was
borrowing from (mingw.git). This happened even though 4msysgit did not use
the branch of mingw.git that was rebased and whose objects were pruned. The
reason is that a merge in 4msysgit.git resulted in a blob that was also in
the rebased branch.
To avoid such situations I propose to introduce "attic" packs. They contain
objects that are unreachable by the local set of refs. Otherwise they are
used like regular packs.
git-repack produces "attic" packs like this:
- Places objects of the local object store that are unreachable in an "attic"
pack.
- Copies objects that are reachable but borrowed from an alternate and are
only in the alternates' "attic" packs into the local regular pack.
git-prune removes "attic" packs.
Then the strategy of garbage collection can be arranged in the following way:
- Repack by starting at the "most complete" repo and work towards the "most
borrowing" ones. During this phase "attic" packs are created. Borrowing repos
get a chance to salvage objects before the alternates prune them away.
- Prune by starting at the "most borrowing" repo and work towards the "most
complete" ones. During this phase the "attic" packs are cleaned up.
What do you think? Is this a way for a solution?
-- Hannes
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Alternates and broken repos: A pack and prune scheme to avoid them
2007-11-18 11:25 [RFC] Alternates and broken repos: A pack and prune scheme to avoid them Johannes Sixt
@ 2007-11-18 18:39 ` Junio C Hamano
2007-11-18 20:01 ` Johannes Sixt
0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2007-11-18 18:39 UTC (permalink / raw)
To: Johannes Sixt; +Cc: git
Johannes Sixt <johannes.sixt@telecom.at> writes:
> Then the strategy of garbage collection can be arranged in the following way:
>
> - Repack by starting at the "most complete" repo and work towards the "most
> borrowing" ones. During this phase "attic" packs are created. Borrowing repos
> get a chance to salvage objects before the alternates prune them away.
>
> - Prune by starting at the "most borrowing" repo and work towards the "most
> complete" ones. During this phase the "attic" packs are cleaned up.
>
> What do you think? Is this a way for a solution?
I would imagine that would work as long as it can be controlled
when all the involved repositories are repacked and pruned, such
as on repo.or.cz case (but on the other hand it is not really
controlled well there and that is the reason you wrote the
message X-<).
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Alternates and broken repos: A pack and prune scheme to avoid them
2007-11-18 18:39 ` Junio C Hamano
@ 2007-11-18 20:01 ` Johannes Sixt
2007-11-18 20:10 ` Junio C Hamano
0 siblings, 1 reply; 14+ messages in thread
From: Johannes Sixt @ 2007-11-18 20:01 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano
On Sunday 18 November 2007 19:39, Junio C Hamano wrote:
> Johannes Sixt <johannes.sixt@telecom.at> writes:
> > Then the strategy of garbage collection can be arranged in the following
> > way:
> >
> > - Repack by starting at the "most complete" repo and work towards the
> > "most borrowing" ones. During this phase "attic" packs are created.
> > Borrowing repos get a chance to salvage objects before the alternates
> > prune them away.
> >
> > - Prune by starting at the "most borrowing" repo and work towards the
> > "most complete" ones. During this phase the "attic" packs are cleaned up.
> >
> > What do you think? Is this a way for a solution?
>
> I would imagine that would work as long as it can be controlled
> when all the involved repositories are repacked and pruned, such
> as on repo.or.cz case (but on the other hand it is not really
> controlled well there and that is the reason you wrote the
> message X-<).
Well, I think in many situations pack and prune can be controlled. To be
precise, if alternates are used pack and prune *must* be controlled.
Currently, the control is very simple: "don't prune" (and I don't recall ATM
what you must not do when you repack).
Anyway, judging from the responses so far it seems that people can live
with "don't prune" (or not using alternates) ;-) Repositories getting broken
this way isn't exactly my itch, either, so... I spelled out a possible
solution if someone wants to pick up the topic.
-- Hannes
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Alternates and broken repos: A pack and prune scheme to avoid them
2007-11-18 20:01 ` Johannes Sixt
@ 2007-11-18 20:10 ` Junio C Hamano
2007-11-29 3:41 ` [PATCH/RFC] Teach repack to optionally retain otherwise lost objects Johannes Schindelin
0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2007-11-18 20:10 UTC (permalink / raw)
To: Johannes Sixt; +Cc: git
Johannes Sixt <johannes.sixt@telecom.at> writes:
> On Sunday 18 November 2007 19:39, Junio C Hamano wrote:
> ...
>> I would imagine that would work as long as it can be controlled
>> when all the involved repositories are repacked and pruned, such
>> as on repo.or.cz case (but on the other hand it is not really
>> controlled well there and that is the reason you wrote the
>> message X-<).
>
> Well, I think in many situations pack and prune can be controlled. To be
> precise, if alternates are used pack and prune *must* be controlled.
> Currently, the control is very simple: "don't prune" (and I don't recall ATM
> what you must not do when you repack).
>
> Anyway, judging from the responses so far it seems that people can live
> with "don't prune" (or not using alternates) ;-)
Because my point was not "don't prune is good enough", I think
you are judging from too small number of responses (in fact,
zero).
My point was that even the existing setup that is well known to
the public (i.e. repo.or.cz) does not seem to be controlled, and
adding a nicer mechanism (e.g. I do not think there currently is
a canned way to prepare a pack that contains only unreachable
objects --- you need to script it anew) for a better control may
not help the situation much, unless it is actually used.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH/RFC] Teach repack to optionally retain otherwise lost objects
2007-11-18 20:10 ` Junio C Hamano
@ 2007-11-29 3:41 ` Johannes Schindelin
2007-11-29 6:15 ` Junio C Hamano
0 siblings, 1 reply; 14+ messages in thread
From: Johannes Schindelin @ 2007-11-29 3:41 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Sixt, git
When specifying --attic=<prefix>, the objects that would be lost when
calling repack with -d will be put into a packfile (or multiple
packfiles), using the file name prefix <prefix>.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
This implements the idea of Hannes.
The plan for repo.or.cz is now to invoke repack with
"--attic=attic" and copied attic-*.{idx,pack} to all the forks'
object stores, then delete the original attic-*{.idx,pack}.
The beauty of that approach is that the order in which the
repositories are repacked is no longer important.
This patch is marked RFC since there is a severe bottleneck
here: the new pack's index is sorted and made unique and every
SHA-1 displayed twice, then the old pack's index is sorted and
made unique. Then the combined result is sorted and only the
now-unique SHA-1s are actually packed.
(The sort is not necessary if there is only _one_ pack.
However, we cannot guarantee that.)
Of course, this is quick 'n dirty, and the price to be paid
is a substantial performance hit: in my tests, linux-2.6.git
needed half a second to show its pack's index, but that
sed 's/^.* //' | sort | uniq | sed p mantra needs 19 seconds.
The obvious thing is to exploit the fact that the pack indices
are already sorted:
I started patch git-show-index so it takes an argument
--missing-objects, followed by the new pack index file names,
followed by --, followed be the old pack index file names.
Then it would traverse all of them simultaneously, outputting
only the SHA-1s of objects that are in an old pack, but not
in any of the new packs.
Two issues: there might be a whole lot of pack files (Pasky
told me today that in one instance there were 416 pack files!)
and that might well exceed the maximum number of open files.
Second issue: there are two different pack index formats, and
the code is not easily refactored AFAICT.
Probably a better method would be not to read the files
simultaneously, but fill a "struct decorate *" with objects
(which can be faked, as we do not really need to parse them)
of the new packs, and then only use decorate_lookup() to determine
for all old packs' objects if they are present in the new ones.
The latter approach would allow for a relatively easy refactoring
of show-index.c; just provide a callback for each entry.
However, I am way too tired today to do it.
Besides, a completely different idea just struck me: before
repacking, .git/objects/pack/* could be _hard linked_ to the
forkee's object stores. Then nothing in git-repack's code
needs to be changed.
Oh, well. I just wasted 1.5 hours.
Documentation/git-repack.txt | 4 +++
git-repack.sh | 40 +++++++++++++++++++++++++++++++++----
t/t5303-repack-attic.sh | 44 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 83 insertions(+), 5 deletions(-)
create mode 100644 t/t5303-repack-attic.sh
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 12e2079..ec2c2bf 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -84,6 +84,10 @@ OPTIONS
If specified, multiple packfiles may be created.
The default is unlimited.
+--attic=<prefix>::
+ Put all objects that would/will be lost when running with `-d`
+ into its own packfile(s), with file name prefix `<prefix>`.
+
Configuration
-------------
diff --git a/git-repack.sh b/git-repack.sh
index e18eb3f..f83d6f0 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -18,12 +18,13 @@ window= size of the window used for delta compression
window-memory= same as the above, but limit memory size instead of entries count
depth= limits the maximum delta depth
max-pack-size= maximum size of each packfile
+attic= pack no-longer-packed used objects into an "attic" pack
"
SUBDIRECTORY_OK='Yes'
. git-sh-setup
no_update_info= all_into_one= remove_redundant= keep_unreachable=
-local= quiet= no_reuse= extra=
+local= quiet= no_reuse= extra= attic=
while test $# != 0
do
case "$1" in
@@ -37,6 +38,8 @@ do
-l) local=--local ;;
--max-pack-size|--window|--window-memory|--depth)
extra="$extra $1=$2"; shift ;;
+ --attic)
+ attic="$2"; shift ;;
--) shift; break;;
*) usage ;;
esac
@@ -119,6 +122,36 @@ for name in $names ; do
rm -f "$PACKDIR/old-pack-$name.pack" "$PACKDIR/old-pack-$name.idx"
done
+new_existing=
+for e in $existing
+do
+ case "$ fullbases " in
+ *" $e "*) ;;
+ *)
+ new_existing="$new_existing $e"
+ ;;
+ esac
+done
+existing="$new_existing"
+
+if test ! -z "$attic"
+then
+ # Find the objects which were in the existing packs, but are no
+ # longer in the new ones.
+ #
+ # This could be much more efficient.
+ (for name in $names
+ do
+ git show-index < $PACKDIR/pack-$name.idx
+ done | sed 's/^.* //' | sort | uniq | sed p &&
+ for e in $existing
+ do
+ git show-index < "$PACKDIR/$e.idx"
+ done | sed 's/^.* //' | sort | uniq) | sort | uniq -u |
+ git pack-objects --non-empty "$attic" ||
+ die "Could not create attic '$attic'."
+fi
+
if test "$remove_redundant" = t
then
# We know $existing are all redundant.
@@ -128,10 +161,7 @@ then
( cd "$PACKDIR" &&
for e in $existing
do
- case " $fullbases " in
- *" $e "*) ;;
- *) rm -f "$e.pack" "$e.idx" "$e.keep" ;;
- esac
+ rm -f "$e.pack" "$e.idx" "$e.keep"
done
)
fi
diff --git a/t/t5303-repack-attic.sh b/t/t5303-repack-attic.sh
new file mode 100644
index 0000000..9777748
--- /dev/null
+++ b/t/t5303-repack-attic.sh
@@ -0,0 +1,44 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='repack with an attic pack'
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+
+ echo "Ten weary, footsore wanderers," > file &&
+ git add file &&
+ test_tick &&
+ git commit -m initial &&
+ echo "all in a woeful plight," >> file &&
+ test_tick &&
+ git commit -m second file &&
+ echo "sought shelter in a wayside-inn" >> file &&
+ test_tick &&
+ git commit -m third file &&
+ echo "one dark and lonely night." >> file &&
+ test_tick &&
+ git commit -m fourth file &&
+ echo ">>Nine rooms, no more<<, the landlord said," >> file &&
+ test_tick && git commit -m fifth file &&
+ git repack -a -d &&
+ ! ls .git/objects/??/*
+
+'
+
+test_expect_success 'create attic pack' '
+
+ LAST_VERSION=$(git rev-parse --verify HEAD:file) &&
+ git reset --hard HEAD^ &&
+ rm .git/logs/HEAD .git/logs/refs/heads/master &&
+ git cat-file blob $LAST_VERSION &&
+ git repack --attic=attic -a -d &&
+ ! git cat-file blob $LAST_VERSION &&
+ test -f attic-*.idx &&
+ cat attic-*.idx | git show-index | grep $LAST_VERSION
+
+'
+
+test_done
--
1.5.3.6.2066.g09421
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH/RFC] Teach repack to optionally retain otherwise lost objects
2007-11-29 3:41 ` [PATCH/RFC] Teach repack to optionally retain otherwise lost objects Johannes Schindelin
@ 2007-11-29 6:15 ` Junio C Hamano
2007-11-29 11:57 ` Johannes Schindelin
0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2007-11-29 6:15 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Johannes Sixt, git
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> Besides, a completely different idea just struck me: before
> repacking, .git/objects/pack/* could be _hard linked_ to the
> forkee's object stores. Then nothing in git-repack's code
> needs to be changed.
>
> Oh, well. I just wasted 1.5 hours.
Your 1.5 hours was spent wisely to come up with that idea ;-).
To make sure I understand your idea correctly, the procedure to repack a
repository in a fork-friendly way is:
(1) find the project directly forked from you;
(2) hardlink all packs under your object store to their object store;
(3) repack -a -l and prune.
I think that would work as long as you do the above as a unit and handle
one repository at a time. Otherwise I think you risk losing necessary
objects when hierarchical forks are involved. E.g. if you have a
project X that has a fork Y which in turn has fork Z.
* Step 1 is run for X, Y and Z.
* Step 2 is run for Y and Z.
* Step 3 is run for Z.
At this point, Z is still borrowing objects from Y and X through Y, and
it will not keep objects it is borrowing from X through Y. Then if the
procedure is intermixed like this, a bad thing happens.
* Step 2 is run for X.
* Step 3 is run for Y.
* Step 3 is run for X.
Step 3 for Y would lose objects Y was borrowing from X that were not
used by Y itself. At this point, Z is still usable as the objects it is
borrowing from X though Y have not been pruned from X. But Step 3 for X
will lose them, rendering Z unusable.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH/RFC] Teach repack to optionally retain otherwise lost objects
2007-11-29 6:15 ` Junio C Hamano
@ 2007-11-29 11:57 ` Johannes Schindelin
2007-11-29 14:21 ` [PATCH] Add "--expire <time>" option to 'git prune' Johannes Schindelin
0 siblings, 1 reply; 14+ messages in thread
From: Johannes Schindelin @ 2007-11-29 11:57 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Sixt, git
Hi,
On Wed, 28 Nov 2007, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Besides, a completely different idea just struck me: before
> > repacking, .git/objects/pack/* could be _hard linked_ to the
> > forkee's object stores. Then nothing in git-repack's code
> > needs to be changed.
> >
> > Oh, well. I just wasted 1.5 hours.
>
> Your 1.5 hours was spent wisely to come up with that idea ;-).
Thanks ;-)
> To make sure I understand your idea correctly, the procedure to repack a
> repository in a fork-friendly way is:
>
> (1) find the project directly forked from you;
>
> (2) hardlink all packs under your object store to their object store;
>
> (3) repack -a -l and prune.
Yep.
> I think that would work as long as you do the above as a unit and handle
> one repository at a time.
Exactly. See
http://repo.or.cz/w/repo.git?a=commitdiff;h=fba501deabd349afbe3b8bf89f385889889e04ac
for a tired proposal.
Note that "prune" is not (yet) an option, since it could possibly destroy
objects which are needed in an ongoing push operation.
However, we could do exactly the same as with reflogs: introduce a grace
period (with loose objects, we can use the ctime...)
> Otherwise I think you risk losing necessary objects when hierarchical
> forks are involved. E.g. if you have a project X that has a fork Y
> which in turn has fork Z.
Well, in theory you could also iterate over all projects and hard link the
packs/objects of their alternates, and _then_ iterate and repack. But it
is simpler and more obvious in the case of repo.or.cz to do all in one
iteration, because we can order the repository names easily so that
forkees come first, _and_ we have an easy way to find out what are the
forks of a project.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] Add "--expire <time>" option to 'git prune'
2007-11-29 11:57 ` Johannes Schindelin
@ 2007-11-29 14:21 ` Johannes Schindelin
2007-11-29 14:35 ` Johannes Sixt
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Johannes Schindelin @ 2007-11-29 14:21 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Sixt, git, pasky
Earlier, 'git prune' would prune all loose unreachable objects.
This could be quite dangerous, as the objects could be used in
an ongoing operation.
This patch adds a mode to expire only loose, unreachable objects
which are older than a certain time. For example, by
git prune --expire 14.days
you can prune only those objects which are loose, unreachable
and older than 14 days (and thus probably outdated).
The implementation uses st.st_mtime rather than st.st_ctime,
because it can be tested better, using 'touch -d <time>' (and
omitting the test when the platform does not support that
command line switch).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
On Thu, 29 Nov 2007, Johannes Schindelin wrote:
> Note that "prune" is not (yet) an option [for repo.or.cz], since
> it could possibly destroy objects which are needed in an ongoing
> push operation.
>
> However, we could do exactly the same as with reflogs: introduce
> a grace period (with loose objects, we can use the ctime...)
and this patch does that (except using mtime as ctime, for reasons
explained in the commit message.
Obviously, this patch is asking for a cousin, changing
git-gc to use this option, and maybe introduce a config
variable gc.pruneAge.
Documentation/git-prune.txt | 5 ++++-
builtin-prune.c | 21 ++++++++++++++++++++-
t/t1410-reflog.sh | 18 ++++++++++++++++++
3 files changed, 42 insertions(+), 2 deletions(-)
diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
index 0ace233..9835bdb 100644
--- a/Documentation/git-prune.txt
+++ b/Documentation/git-prune.txt
@@ -8,7 +8,7 @@ git-prune - Prune all unreachable objects from the object database
SYNOPSIS
--------
-'git-prune' [-n] [--] [<head>...]
+'git-prune' [-n] [--expire <expire>] [--] [<head>...]
DESCRIPTION
-----------
@@ -31,6 +31,9 @@ OPTIONS
\--::
Do not interpret any more arguments as options.
+\--expire <time>::
+ Only expire loose objects older than <time>.
+
<head>...::
In addition to objects
reachable from any of our references, keep objects
diff --git a/builtin-prune.c b/builtin-prune.c
index 44df59e..b5e7684 100644
--- a/builtin-prune.c
+++ b/builtin-prune.c
@@ -7,15 +7,24 @@
static const char prune_usage[] = "git-prune [-n]";
static int show_only;
+static unsigned long expire;
static int prune_object(char *path, const char *filename, const unsigned char *sha1)
{
+ const char *fullpath = mkpath("%s/%s", path, filename);
+ if (expire) {
+ struct stat st;
+ if (lstat(fullpath, &st))
+ return error("Could not stat '%s'", fullpath);
+ if (st.st_mtime > expire)
+ return 0;
+ }
if (show_only) {
enum object_type type = sha1_object_info(sha1, NULL);
printf("%s %s\n", sha1_to_hex(sha1),
(type > 0) ? typename(type) : "unknown");
} else
- unlink(mkpath("%s/%s", path, filename));
+ unlink(fullpath);
return 0;
}
@@ -85,6 +94,16 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
show_only = 1;
continue;
}
+ if (!strcmp(arg, "--expire")) {
+ if (++i < argc) {
+ expire = approxidate(argv[i]);
+ continue;
+ }
+ }
+ else if (!prefixcmp(arg, "--expire=")) {
+ expire = approxidate(arg + 9);
+ continue;
+ }
usage(prune_usage);
}
diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
index 12a53ed..f093802 100755
--- a/t/t1410-reflog.sh
+++ b/t/t1410-reflog.sh
@@ -201,4 +201,22 @@ test_expect_success 'delete' '
! grep dragon < output
'
+test_expect_success 'prune --expire' '
+
+ BLOB=$(echo aleph | git hash-object -w --stdin) &&
+ BLOB_FILE=.git/objects/$(echo $BLOB | sed "s/^../&\//") &&
+ test 20 = $(git count-objects | sed "s/ .*//") &&
+ test -f $BLOB_FILE &&
+ git reset --hard &&
+ if touch -d "Jan 1 1970" $BLOB_FILE
+ then
+ git prune --expire 1.day &&
+ test 19 = $(git count-objects | sed "s/ .*//") &&
+ ! test -f $BLOB_FILE
+ else
+ say "Skipping test due to non-working touch -d"
+ fi
+
+'
+
test_done
--
1.5.3.6.2087.g788ea4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] Add "--expire <time>" option to 'git prune'
2007-11-29 14:21 ` [PATCH] Add "--expire <time>" option to 'git prune' Johannes Schindelin
@ 2007-11-29 14:35 ` Johannes Sixt
2007-11-29 15:22 ` [PATCH v2] " Johannes Schindelin
2007-11-29 15:12 ` [PATCH] " Jeff King
2007-11-29 20:06 ` Junio C Hamano
2 siblings, 1 reply; 14+ messages in thread
From: Johannes Sixt @ 2007-11-29 14:35 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, git, pasky
Johannes Schindelin schrieb:
> +test_expect_success 'prune --expire' '
> +
> + BLOB=$(echo aleph | git hash-object -w --stdin) &&
> + BLOB_FILE=.git/objects/$(echo $BLOB | sed "s/^../&\//") &&
> + test 20 = $(git count-objects | sed "s/ .*//") &&
> + test -f $BLOB_FILE &&
> + git reset --hard &&
Here you could throw in:
git prune --expire=1.hour.ago &&
test 20 = $(git count-objects | sed "s/ .*//") &&
test -f $BLOB_FILE &&
to test that the object is not pruned (and the alternate --expire syntax).
> + if touch -d "Jan 1 1970" $BLOB_FILE
> + then
> + git prune --expire 1.day &&
> + test 19 = $(git count-objects | sed "s/ .*//") &&
> + ! test -f $BLOB_FILE
> + else
> + say "Skipping test due to non-working touch -d"
> + fi
-- Hannes
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add "--expire <time>" option to 'git prune'
2007-11-29 14:21 ` [PATCH] Add "--expire <time>" option to 'git prune' Johannes Schindelin
2007-11-29 14:35 ` Johannes Sixt
@ 2007-11-29 15:12 ` Jeff King
2007-11-29 16:13 ` Johannes Schindelin
2007-11-29 20:06 ` Junio C Hamano
2 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2007-11-29 15:12 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, Johannes Sixt, git, pasky
On Thu, Nov 29, 2007 at 02:21:23PM +0000, Johannes Schindelin wrote:
> This patch adds a mode to expire only loose, unreachable objects
> which are older than a certain time. For example, by
>
> git prune --expire 14.days
>
> you can prune only those objects which are loose, unreachable
> and older than 14 days (and thus probably outdated).
Does this now make git-prune safe for automatic running?
I suppose you could still be actively manipulating refs that point to
very old objects.
-Peff
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2] Add "--expire <time>" option to 'git prune'
2007-11-29 14:35 ` Johannes Sixt
@ 2007-11-29 15:22 ` Johannes Schindelin
0 siblings, 0 replies; 14+ messages in thread
From: Johannes Schindelin @ 2007-11-29 15:22 UTC (permalink / raw)
To: Johannes Sixt; +Cc: Junio C Hamano, git, pasky
Earlier, 'git prune' would prune all loose unreachable objects.
This could be quite dangerous, as the objects could be used in
an ongoing operation.
This patch adds a mode to expire only loose, unreachable objects
which are older than a certain time. For example, by
git prune --expire 14.days
you can prune only those objects which are loose, unreachable
and older than 14 days (and thus probably outdated).
The implementation uses st.st_mtime rather than st.st_ctime,
because it can be tested better, using 'touch -d <time>' (and
omitting the test when the platform does not support that
command line switch).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
On Thu, 29 Nov 2007, Johannes Sixt wrote:
> Johannes Schindelin schrieb:
> > +test_expect_success 'prune --expire' '
> > +
> > + BLOB=$(echo aleph | git hash-object -w --stdin) &&
> > + BLOB_FILE=.git/objects/$(echo $BLOB | sed "s/^../&\//") &&
> > + test 20 = $(git count-objects | sed "s/ .*//") &&
> > + test -f $BLOB_FILE &&
> > + git reset --hard &&
>
> Here you could throw in:
>
> git prune --expire=1.hour.ago &&
> test 20 = $(git count-objects | sed "s/ .*//") &&
> test -f $BLOB_FILE &&
>
> to test that the object is not pruned (and the alternate
> --expire syntax).
Good idea!
Documentation/git-prune.txt | 5 ++++-
builtin-prune.c | 21 ++++++++++++++++++++-
t/t1410-reflog.sh | 21 +++++++++++++++++++++
3 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
index 0ace233..9835bdb 100644
--- a/Documentation/git-prune.txt
+++ b/Documentation/git-prune.txt
@@ -8,7 +8,7 @@ git-prune - Prune all unreachable objects from the object database
SYNOPSIS
--------
-'git-prune' [-n] [--] [<head>...]
+'git-prune' [-n] [--expire <expire>] [--] [<head>...]
DESCRIPTION
-----------
@@ -31,6 +31,9 @@ OPTIONS
\--::
Do not interpret any more arguments as options.
+\--expire <time>::
+ Only expire loose objects older than <time>.
+
<head>...::
In addition to objects
reachable from any of our references, keep objects
diff --git a/builtin-prune.c b/builtin-prune.c
index 44df59e..b5e7684 100644
--- a/builtin-prune.c
+++ b/builtin-prune.c
@@ -7,15 +7,24 @@
static const char prune_usage[] = "git-prune [-n]";
static int show_only;
+static unsigned long expire;
static int prune_object(char *path, const char *filename, const unsigned char *sha1)
{
+ const char *fullpath = mkpath("%s/%s", path, filename);
+ if (expire) {
+ struct stat st;
+ if (lstat(fullpath, &st))
+ return error("Could not stat '%s'", fullpath);
+ if (st.st_mtime > expire)
+ return 0;
+ }
if (show_only) {
enum object_type type = sha1_object_info(sha1, NULL);
printf("%s %s\n", sha1_to_hex(sha1),
(type > 0) ? typename(type) : "unknown");
} else
- unlink(mkpath("%s/%s", path, filename));
+ unlink(fullpath);
return 0;
}
@@ -85,6 +94,16 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
show_only = 1;
continue;
}
+ if (!strcmp(arg, "--expire")) {
+ if (++i < argc) {
+ expire = approxidate(argv[i]);
+ continue;
+ }
+ }
+ else if (!prefixcmp(arg, "--expire=")) {
+ expire = approxidate(arg + 9);
+ continue;
+ }
usage(prune_usage);
}
diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
index 12a53ed..3924dc4 100755
--- a/t/t1410-reflog.sh
+++ b/t/t1410-reflog.sh
@@ -201,4 +201,25 @@ test_expect_success 'delete' '
! grep dragon < output
'
+test_expect_success 'prune --expire' '
+
+ BLOB=$(echo aleph | git hash-object -w --stdin) &&
+ BLOB_FILE=.git/objects/$(echo $BLOB | sed "s/^../&\//") &&
+ test 20 = $(git count-objects | sed "s/ .*//") &&
+ test -f $BLOB_FILE &&
+ git reset --hard &&
+ git prune --expire=1.hour.ago &&
+ test 20 = $(git count-objects | sed "s/ .*//") &&
+ test -f $BLOB_FILE &&
+ if touch -d "Jan 1 1970" $BLOB_FILE
+ then
+ git prune --expire 1.day &&
+ test 19 = $(git count-objects | sed "s/ .*//") &&
+ ! test -f $BLOB_FILE
+ else
+ say "Skipping test due to non-working touch -d"
+ fi
+
+'
+
test_done
--
1.5.3.6.2088.g8c260
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] Add "--expire <time>" option to 'git prune'
2007-11-29 15:12 ` [PATCH] " Jeff King
@ 2007-11-29 16:13 ` Johannes Schindelin
0 siblings, 0 replies; 14+ messages in thread
From: Johannes Schindelin @ 2007-11-29 16:13 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Johannes Sixt, git, pasky
Hi,
On Thu, 29 Nov 2007, Jeff King wrote:
> On Thu, Nov 29, 2007 at 02:21:23PM +0000, Johannes Schindelin wrote:
>
> > This patch adds a mode to expire only loose, unreachable objects
> > which are older than a certain time. For example, by
> >
> > git prune --expire 14.days
> >
> > you can prune only those objects which are loose, unreachable
> > and older than 14 days (and thus probably outdated).
>
> Does this now make git-prune safe for automatic running?
>
> I suppose you could still be actively manipulating refs that point to
> very old objects.
That's why I want to have it configurable from git-gc.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add "--expire <time>" option to 'git prune'
2007-11-29 14:21 ` [PATCH] Add "--expire <time>" option to 'git prune' Johannes Schindelin
2007-11-29 14:35 ` Johannes Sixt
2007-11-29 15:12 ` [PATCH] " Jeff King
@ 2007-11-29 20:06 ` Junio C Hamano
2007-11-29 20:59 ` Johannes Schindelin
2 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2007-11-29 20:06 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Johannes Sixt, git, pasky
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> The implementation uses st.st_mtime rather than st.st_ctime,
> because it can be tested better, using 'touch -d <time>' (and
> omitting the test when the platform does not support that
> command line switch).
But I think you can use more portable -t for setting mtime to
1970/01/01, but I had a feeling that earlier we were bitten by
non-portability of "touch" and introduced test-chmtime.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] Add "--expire <time>" option to 'git prune'
2007-11-29 20:06 ` Junio C Hamano
@ 2007-11-29 20:59 ` Johannes Schindelin
0 siblings, 0 replies; 14+ messages in thread
From: Johannes Schindelin @ 2007-11-29 20:59 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Sixt, git, pasky
Earlier, 'git prune' would prune all loose unreachable objects.
This could be quite dangerous, as the objects could be used in
an ongoing operation.
This patch adds a mode to expire only loose, unreachable objects
which are older than a certain time. For example, by
git prune --expire 14.days
you can prune only those objects which are loose, unreachable
and older than 14 days (and thus probably outdated).
The implementation uses st.st_mtime rather than st.st_ctime,
because it can be tested better, using 'touch -d <time>' (and
omitting the test when the platform does not support that
command line switch).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
On Thu, 29 Nov 2007, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > The implementation uses st.st_mtime rather than st.st_ctime,
> > because it can be tested better, using 'touch -d <time>' (and
> > omitting the test when the platform does not support that
> > command line switch).
>
> But I think you can use more portable -t for setting mtime to
> 1970/01/01, but I had a feeling that earlier we were bitten by
> non-portability of "touch" and introduced test-chmtime.
Somehow that slipped by me. This patch uses test-chmtime.
Documentation/git-prune.txt | 5 ++++-
builtin-prune.c | 21 ++++++++++++++++++++-
t/t1410-reflog.sh | 17 +++++++++++++++++
3 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
index 0ace233..9835bdb 100644
--- a/Documentation/git-prune.txt
+++ b/Documentation/git-prune.txt
@@ -8,7 +8,7 @@ git-prune - Prune all unreachable objects from the object database
SYNOPSIS
--------
-'git-prune' [-n] [--] [<head>...]
+'git-prune' [-n] [--expire <expire>] [--] [<head>...]
DESCRIPTION
-----------
@@ -31,6 +31,9 @@ OPTIONS
\--::
Do not interpret any more arguments as options.
+\--expire <time>::
+ Only expire loose objects older than <time>.
+
<head>...::
In addition to objects
reachable from any of our references, keep objects
diff --git a/builtin-prune.c b/builtin-prune.c
index 44df59e..b5e7684 100644
--- a/builtin-prune.c
+++ b/builtin-prune.c
@@ -7,15 +7,24 @@
static const char prune_usage[] = "git-prune [-n]";
static int show_only;
+static unsigned long expire;
static int prune_object(char *path, const char *filename, const unsigned char *sha1)
{
+ const char *fullpath = mkpath("%s/%s", path, filename);
+ if (expire) {
+ struct stat st;
+ if (lstat(fullpath, &st))
+ return error("Could not stat '%s'", fullpath);
+ if (st.st_mtime > expire)
+ return 0;
+ }
if (show_only) {
enum object_type type = sha1_object_info(sha1, NULL);
printf("%s %s\n", sha1_to_hex(sha1),
(type > 0) ? typename(type) : "unknown");
} else
- unlink(mkpath("%s/%s", path, filename));
+ unlink(fullpath);
return 0;
}
@@ -85,6 +94,16 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
show_only = 1;
continue;
}
+ if (!strcmp(arg, "--expire")) {
+ if (++i < argc) {
+ expire = approxidate(argv[i]);
+ continue;
+ }
+ }
+ else if (!prefixcmp(arg, "--expire=")) {
+ expire = approxidate(arg + 9);
+ continue;
+ }
usage(prune_usage);
}
diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
index 12a53ed..4a17573 100755
--- a/t/t1410-reflog.sh
+++ b/t/t1410-reflog.sh
@@ -201,4 +201,21 @@ test_expect_success 'delete' '
! grep dragon < output
'
+test_expect_success 'prune --expire' '
+
+ BLOB=$(echo aleph | git hash-object -w --stdin) &&
+ BLOB_FILE=.git/objects/$(echo $BLOB | sed "s/^../&\//") &&
+ test 20 = $(git count-objects | sed "s/ .*//") &&
+ test -f $BLOB_FILE &&
+ git reset --hard &&
+ git prune --expire=1.hour.ago &&
+ test 20 = $(git count-objects | sed "s/ .*//") &&
+ test -f $BLOB_FILE &&
+ test-chmtime -86400 $BLOB_FILE &&
+ git prune --expire 1.day &&
+ test 19 = $(git count-objects | sed "s/ .*//") &&
+ ! test -f $BLOB_FILE
+
+'
+
test_done
--
1.5.3.6.2088.g8c260
^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-11-29 21:01 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-18 11:25 [RFC] Alternates and broken repos: A pack and prune scheme to avoid them Johannes Sixt
2007-11-18 18:39 ` Junio C Hamano
2007-11-18 20:01 ` Johannes Sixt
2007-11-18 20:10 ` Junio C Hamano
2007-11-29 3:41 ` [PATCH/RFC] Teach repack to optionally retain otherwise lost objects Johannes Schindelin
2007-11-29 6:15 ` Junio C Hamano
2007-11-29 11:57 ` Johannes Schindelin
2007-11-29 14:21 ` [PATCH] Add "--expire <time>" option to 'git prune' Johannes Schindelin
2007-11-29 14:35 ` Johannes Sixt
2007-11-29 15:22 ` [PATCH v2] " Johannes Schindelin
2007-11-29 15:12 ` [PATCH] " Jeff King
2007-11-29 16:13 ` Johannes Schindelin
2007-11-29 20:06 ` Junio C Hamano
2007-11-29 20:59 ` Johannes Schindelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).