* [RFC PATCH] repack: make repack -a equivalent to repack -A and drop previous -a behavior
@ 2008-11-13 23:20 Brandon Casey
0 siblings, 0 replies; 8+ messages in thread
From: Brandon Casey @ 2008-11-13 23:20 UTC (permalink / raw)
To: git; +Cc: Brandon Casey
Once upon a time, repack had only a single option which began with the first
letter of the alphabet. Then, a second was created which would repack
unreachable objects into the newly created pack so that git-gc --auto could
be invented. But, the -a option was still necessary so that it could be
called every now and then to discard the unreachable objects that were being
repacked over and over and over into newly generated packs. Later, -A was
changed so that instead of repacking the unreachable objects, it ejected
them from the pack so that they resided in the object store in loose form,
to be garbage collected by prune-packed according to normal expiry rules.
And so, -a lost its raison d'etre.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
---
This is on top of bc/maint-keep-pack
-brandon
Documentation/git-repack.txt | 25 ++++++++++++-------------
git-repack.sh | 8 ++++----
2 files changed, 16 insertions(+), 17 deletions(-)
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index aaa8852..d04d5c2 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -32,21 +32,20 @@ OPTIONS
pack everything referenced into a single pack.
Especially useful when packing a repository that is used
for private development and there is no need to worry
- about people fetching via dumb protocols from it. Use
- with '-d'. This will clean up the objects that `git prune`
- leaves behind, but `git fsck --full` shows as
- dangling.
+ about people fetching via dumb protocols from it. If used
+ with '-d' , then any unreachable objects in a previous pack will
+ become loose, unpacked objects, instead of being left in the
+ old pack. Unreachable objects are never intentionally added to
+ a pack, even when repacking. This option prevents unreachable
+ objects from being immediately deleted by way of being left in
+ the old pack and then removed. Instead, the loose unreachable
+ objects will be pruned according to normal expiry rules
+ with the next 'git-gc' invocation. See linkgit:git-gc[1].
-A::
- Same as `-a`, unless '-d' is used. Then any unreachable
- objects in a previous pack become loose, unpacked objects,
- instead of being left in the old pack. Unreachable objects
- are never intentionally added to a pack, even when repacking.
- This option prevents unreachable objects from being immediately
- deleted by way of being left in the old pack and then
- removed. Instead, the loose unreachable objects
- will be pruned according to normal expiry rules
- with the next 'git-gc' invocation. See linkgit:git-gc[1].
+ Same as `-a`. Historical note: the -a and -A options used to differ
+ in that -a did not leave unreachable objects unpacked. Instead,
+ they were removed along with the redundant pack (when -d was used).
-d::
After packing, if the newly created packs make some
diff --git a/git-repack.sh b/git-repack.sh
index 458a497..f1e21b9 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -7,8 +7,9 @@ OPTIONS_KEEPDASHDASH=
OPTIONS_SPEC="\
git repack [options]
--
-a pack everything in a single pack
-A same as -a, and turn unreachable objects loose
+a pack everything in a single pack, and turn unreachable objects
+ loose
+A same as -a
d remove redundant packs, and run git-prune-packed
f pass --no-reuse-object to git-pack-objects
n do not run git-update-server-info
@@ -29,8 +30,7 @@ while test $# != 0
do
case "$1" in
-n) no_update_info=t ;;
- -a) all_into_one=t ;;
- -A) all_into_one=t
+ -a|-A) all_into_one=t
unpack_unreachable=--unpack-unreachable ;;
-d) remove_redundant=t ;;
-q) quiet=-q ;;
--
1.6.0.3.552.g12334
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH] repack: make repack -a equivalent to repack -A and drop previous -a behavior
@ 2008-11-13 23:22 Brandon Casey
2008-11-14 0:02 ` Björn Steinbrink
0 siblings, 1 reply; 8+ messages in thread
From: Brandon Casey @ 2008-11-13 23:22 UTC (permalink / raw)
To: git
Once upon a time, repack had only a single option which began with the first
letter of the alphabet. Then, a second was created which would repack
unreachable objects into the newly created pack so that git-gc --auto could
be invented. But, the -a option was still necessary so that it could be
called every now and then to discard the unreachable objects that were being
repacked over and over and over into newly generated packs. Later, -A was
changed so that instead of repacking the unreachable objects, it ejected
them from the pack so that they resided in the object store in loose form,
to be garbage collected by prune-packed according to normal expiry rules.
And so, -a lost its raison d'etre.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
---
This is on top of bc/maint-keep-pack
-brandon
Documentation/git-repack.txt | 25 ++++++++++++-------------
git-repack.sh | 8 ++++----
2 files changed, 16 insertions(+), 17 deletions(-)
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index aaa8852..d04d5c2 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -32,21 +32,20 @@ OPTIONS
pack everything referenced into a single pack.
Especially useful when packing a repository that is used
for private development and there is no need to worry
- about people fetching via dumb protocols from it. Use
- with '-d'. This will clean up the objects that `git prune`
- leaves behind, but `git fsck --full` shows as
- dangling.
+ about people fetching via dumb protocols from it. If used
+ with '-d' , then any unreachable objects in a previous pack will
+ become loose, unpacked objects, instead of being left in the
+ old pack. Unreachable objects are never intentionally added to
+ a pack, even when repacking. This option prevents unreachable
+ objects from being immediately deleted by way of being left in
+ the old pack and then removed. Instead, the loose unreachable
+ objects will be pruned according to normal expiry rules
+ with the next 'git-gc' invocation. See linkgit:git-gc[1].
-A::
- Same as `-a`, unless '-d' is used. Then any unreachable
- objects in a previous pack become loose, unpacked objects,
- instead of being left in the old pack. Unreachable objects
- are never intentionally added to a pack, even when repacking.
- This option prevents unreachable objects from being immediately
- deleted by way of being left in the old pack and then
- removed. Instead, the loose unreachable objects
- will be pruned according to normal expiry rules
- with the next 'git-gc' invocation. See linkgit:git-gc[1].
+ Same as `-a`. Historical note: the -a and -A options used to differ
+ in that -a did not leave unreachable objects unpacked. Instead,
+ they were removed along with the redundant pack (when -d was used).
-d::
After packing, if the newly created packs make some
diff --git a/git-repack.sh b/git-repack.sh
index 458a497..f1e21b9 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -7,8 +7,9 @@ OPTIONS_KEEPDASHDASH=
OPTIONS_SPEC="\
git repack [options]
--
-a pack everything in a single pack
-A same as -a, and turn unreachable objects loose
+a pack everything in a single pack, and turn unreachable objects
+ loose
+A same as -a
d remove redundant packs, and run git-prune-packed
f pass --no-reuse-object to git-pack-objects
n do not run git-update-server-info
@@ -29,8 +30,7 @@ while test $# != 0
do
case "$1" in
-n) no_update_info=t ;;
- -a) all_into_one=t ;;
- -A) all_into_one=t
+ -a|-A) all_into_one=t
unpack_unreachable=--unpack-unreachable ;;
-d) remove_redundant=t ;;
-q) quiet=-q ;;
--
1.6.0.3.552.g12334
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] repack: make repack -a equivalent to repack -A and drop previous -a behavior
2008-11-13 23:22 [RFC PATCH] repack: make repack -a equivalent to repack -A and drop previous -a behavior Brandon Casey
@ 2008-11-14 0:02 ` Björn Steinbrink
2008-11-14 0:53 ` Brandon Casey
0 siblings, 1 reply; 8+ messages in thread
From: Björn Steinbrink @ 2008-11-14 0:02 UTC (permalink / raw)
To: Brandon Casey; +Cc: git
On 2008.11.13 17:22:36 -0600, Brandon Casey wrote:
> Once upon a time, repack had only a single option which began with the first
> letter of the alphabet. Then, a second was created which would repack
> unreachable objects into the newly created pack so that git-gc --auto could
> be invented. But, the -a option was still necessary so that it could be
> called every now and then to discard the unreachable objects that were being
> repacked over and over and over into newly generated packs. Later, -A was
> changed so that instead of repacking the unreachable objects, it ejected
> them from the pack so that they resided in the object store in loose form,
> to be garbage collected by prune-packed according to normal expiry rules.
>
> And so, -a lost its raison d'etre.
>
> Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
> ---
>
>
> This is on top of bc/maint-keep-pack
I didn't check all the (proposed) commits for that branch, so just let
me know if I'm missing anything, but doesn't this change mean that you
just lose what "-ad" did?
We have:
-a Create a new pack, containing all reachable objects
-A Same as -a
-ad Same as -a, and drop all old packs and loose objects
-Ad Sama as -ad, but keep unreachable objects loose
-Ad is nice regarding it's safety-net value, but eg. after a large
filter-branch run, when refs/original and the reflogs have been cleaned,
you just want to get rid of all those old unreachable objects,
immediately. For example after importing and massaging some large
history from SVN, the -Ad behaviour is definitely _not_ what I want
there. Writing a few thousand loose objects just to prune them is just a
waste of time.
Björn
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] repack: make repack -a equivalent to repack -A and drop previous -a behavior
2008-11-14 0:02 ` Björn Steinbrink
@ 2008-11-14 0:53 ` Brandon Casey
2008-11-14 1:25 ` Björn Steinbrink
2008-11-14 2:22 ` Theodore Tso
0 siblings, 2 replies; 8+ messages in thread
From: Brandon Casey @ 2008-11-14 0:53 UTC (permalink / raw)
To: Björn Steinbrink; +Cc: git
Björn Steinbrink wrote:
> On 2008.11.13 17:22:36 -0600, Brandon Casey wrote:
>> Once upon a time, repack had only a single option which began with the first
>> letter of the alphabet. Then, a second was created which would repack
>> unreachable objects into the newly created pack so that git-gc --auto could
>> be invented. But, the -a option was still necessary so that it could be
>> called every now and then to discard the unreachable objects that were being
>> repacked over and over and over into newly generated packs. Later, -A was
>> changed so that instead of repacking the unreachable objects, it ejected
>> them from the pack so that they resided in the object store in loose form,
>> to be garbage collected by prune-packed according to normal expiry rules.
>>
>> And so, -a lost its raison d'etre.
>>
>> Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
>> ---
>>
>>
>> This is on top of bc/maint-keep-pack
>
> I didn't check all the (proposed) commits for that branch, so just let
> me know if I'm missing anything, but doesn't this change mean that you
> just lose what "-ad" did?
yes.
> We have:
> -a Create a new pack, containing all reachable objects
> -A Same as -a
> -ad Same as -a, and drop all old packs and loose objects
by loose objects, I assume you mean packed unreachable objects.
> -Ad Sama as -ad, but keep unreachable objects loose
>
> -Ad is nice regarding it's safety-net value, but eg. after a large
> filter-branch run, when refs/original and the reflogs have been cleaned,
> you just want to get rid of all those old unreachable objects,
> immediately. For example after importing and massaging some large
> history from SVN, the -Ad behaviour is definitely _not_ what I want
> there. Writing a few thousand loose objects just to prune them is just a
> waste of time.
hmm. That's a good point. Even though I think it is likely that the thousand
loose objects that are written will be small commit objects and not blobs,
this use case may be enough to trump the safety benefit provided by the
proposed change.
-brandon
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] repack: make repack -a equivalent to repack -A and drop previous -a behavior
2008-11-14 0:53 ` Brandon Casey
@ 2008-11-14 1:25 ` Björn Steinbrink
2008-11-14 1:36 ` Brandon Casey
2008-11-14 2:22 ` Theodore Tso
1 sibling, 1 reply; 8+ messages in thread
From: Björn Steinbrink @ 2008-11-14 1:25 UTC (permalink / raw)
To: Brandon Casey; +Cc: git
On 2008.11.13 18:53:29 -0600, Brandon Casey wrote:
> Björn Steinbrink wrote:
> > I didn't check all the (proposed) commits for that branch, so just let
> > me know if I'm missing anything, but doesn't this change mean that you
> > just lose what "-ad" did?
>
> yes.
>
> > We have:
> > -a Create a new pack, containing all reachable objects
> > -A Same as -a
> > -ad Same as -a, and drop all old packs and loose objects
>
> by loose objects, I assume you mean packed unreachable objects.
No, actually I just totally ignored the fact that -a of course already
deletes the loose objects. The packed unreachable objects are in the old
packs, so they're already included in the first half of my sentence ;-)
> > -Ad Sama as -ad, but keep unreachable objects loose
> >
> > -Ad is nice regarding it's safety-net value, but eg. after a large
> > filter-branch run, when refs/original and the reflogs have been cleaned,
> > you just want to get rid of all those old unreachable objects,
> > immediately. For example after importing and massaging some large
> > history from SVN, the -Ad behaviour is definitely _not_ what I want
> > there. Writing a few thousand loose objects just to prune them is just a
> > waste of time.
>
> hmm. That's a good point. Even though I think it is likely that the thousand
> loose objects that are written will be small commit objects and not blobs,
When you only fix up merge commits, author information and such things,
then yes, most objects will be commits. And then it's not even that bad.
But a more interesting case is when in your old SCM you had multiple
projects in one repo, and you can't sanely separate them before the
import. So you might end up using the subdirectory filter a few times,
or even just drop a bunch of branches in each copy of your import.
And another one is when you had accidently commited some huge, useless
files, and as you're switching to git now anyway, you want to get rid of
them, so you use an index-filter to drop them.
For those two cases, -Ad vs -ad can make a huge difference. I remember
someone on #git using a subdirectory filter on some project and trying
to get the repo to a sane size afterwards. -Ad took basically forever,
while -ad finished in 5 seconds or so.
> this use case may be enough to trump the safety benefit provided by the
> proposed change.
IMHO, "git gc" already provides enough safety. I tend to see "gc" as the
regular "just use it" tool, while repack gives me more control over how
I want things to be done, without forcing me to use the real plumbing or
to fumble around with the configuration for gc. And when I want control,
I'm generally prepared to shoot myself in the foot.
Björn
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] repack: make repack -a equivalent to repack -A and drop previous -a behavior
2008-11-14 1:25 ` Björn Steinbrink
@ 2008-11-14 1:36 ` Brandon Casey
2008-11-14 1:48 ` Björn Steinbrink
0 siblings, 1 reply; 8+ messages in thread
From: Brandon Casey @ 2008-11-14 1:36 UTC (permalink / raw)
To: Björn Steinbrink; +Cc: git
Björn Steinbrink wrote:
> On 2008.11.13 18:53:29 -0600, Brandon Casey wrote:
>> Björn Steinbrink wrote:
>>> I didn't check all the (proposed) commits for that branch, so just let
>>> me know if I'm missing anything, but doesn't this change mean that you
>>> just lose what "-ad" did?
>> yes.
>>
>>> We have:
>>> -a Create a new pack, containing all reachable objects
>>> -A Same as -a
>>> -ad Same as -a, and drop all old packs and loose objects
>> by loose objects, I assume you mean packed unreachable objects.
>
> No, actually I just totally ignored the fact that -a of course already
> deletes the loose objects.
Actually, I had forgotten that repack deletes any loose objects at all.
It does call prune-packed, but only when -d is used.
> IMHO, "git gc" already provides enough safety. I tend to see "gc" as the
> regular "just use it" tool, while repack gives me more control over how
> I want things to be done, without forcing me to use the real plumbing or
> to fumble around with the configuration for gc. And when I want control,
> I'm generally prepared to shoot myself in the foot.
I think you're right. Thanks for providing an example of a real use case.
-brandon
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] repack: make repack -a equivalent to repack -A and drop previous -a behavior
2008-11-14 1:36 ` Brandon Casey
@ 2008-11-14 1:48 ` Björn Steinbrink
0 siblings, 0 replies; 8+ messages in thread
From: Björn Steinbrink @ 2008-11-14 1:48 UTC (permalink / raw)
To: Brandon Casey; +Cc: git
On 2008.11.13 19:36:45 -0600, Brandon Casey wrote:
> Björn Steinbrink wrote:
> > On 2008.11.13 18:53:29 -0600, Brandon Casey wrote:
> >> Björn Steinbrink wrote:
> >>> We have:
> >>> -a Create a new pack, containing all reachable objects
> >>> -A Same as -a
> >>> -ad Same as -a, and drop all old packs and loose objects
> >> by loose objects, I assume you mean packed unreachable objects.
> >
> > No, actually I just totally ignored the fact that -a of course already
> > deletes the loose objects.
>
> Actually, I had forgotten that repack deletes any loose objects at all.
> It does call prune-packed, but only when -d is used.
Ugh, right. -a does not delete loose objects without -d. So, ignoring
the .keep stuff, my initial description was even right and I just
confused myself afterwards :-/
Thanks,
Björn
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] repack: make repack -a equivalent to repack -A and drop previous -a behavior
2008-11-14 0:53 ` Brandon Casey
2008-11-14 1:25 ` Björn Steinbrink
@ 2008-11-14 2:22 ` Theodore Tso
1 sibling, 0 replies; 8+ messages in thread
From: Theodore Tso @ 2008-11-14 2:22 UTC (permalink / raw)
To: Brandon Casey; +Cc: Björn Steinbrink, git
On Thu, Nov 13, 2008 at 06:53:29PM -0600, Brandon Casey wrote:
> > -Ad is nice regarding it's safety-net value, but eg. after a large
> > filter-branch run, when refs/original and the reflogs have been cleaned,
> > you just want to get rid of all those old unreachable objects,
> > immediately. For example after importing and massaging some large
> > history from SVN, the -Ad behaviour is definitely _not_ what I want
> > there. Writing a few thousand loose objects just to prune them is just a
> > waste of time.
>
> hmm. That's a good point. Even though I think it is likely that the thousand
> loose objects that are written will be small commit objects and not blobs,
> this use case may be enough to trump the safety benefit provided by the
> proposed change.
The problem is even small commit objects take a full 4k (or whatever
your filesystem block size is) when they are ejected as loose objects.
As a result, the current "git gc" defaults can end up requiring far
*more* disk space than before, certainly while it is running, and
sometimes even after the "git gc" completes. (I then end up running
"git prune" to complete deletion of the ejected objects.)
Sometimes this gets so annoying that I'll run the individual commands
run by git-gc by hand, except I use git repack -ad instead of git
repack -A. If we are going to get rid of the distinction between git
repack -a and git repack -A, perhaps there can be a config option to
force the immediate ejection of the unreachable objects, instead of
creating loose objects?
If the goal is safety, it would be nice if git repack could create a
separate pack that only contained unreachable objects, and then have
git prune be able to remove a pack if it only contains unreachable
objects.
- Ted
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-11-14 2:23 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-13 23:22 [RFC PATCH] repack: make repack -a equivalent to repack -A and drop previous -a behavior Brandon Casey
2008-11-14 0:02 ` Björn Steinbrink
2008-11-14 0:53 ` Brandon Casey
2008-11-14 1:25 ` Björn Steinbrink
2008-11-14 1:36 ` Brandon Casey
2008-11-14 1:48 ` Björn Steinbrink
2008-11-14 2:22 ` Theodore Tso
-- strict thread matches above, loose matches on Subject: below --
2008-11-13 23:20 Brandon Casey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox