git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git gc & deleted branches
@ 2008-05-08 17:45 Guido Ostkamp
  2008-05-08 18:39 ` Jeff King
  0 siblings, 1 reply; 50+ messages in thread
From: Guido Ostkamp @ 2008-05-08 17:45 UTC (permalink / raw)
  To: git

Hello,

I'm trying to reclaim space from an abandoned branch (never involved in 
any merge) using 'git gc', but it doesn't appear to work:

   mkdir testrepo
   cd testrepo
   git init
   dd if=/dev/urandom bs=1024k count=10 of=file
   git add file
   git commit -a -m 'initial checkin'
   git checkout -b test
   dd if=/dev/urandom bs=1024k count=10 of=file
   git commit -a -m 'branch checkin'
   git checkout master
   du -s .    # returns 30960
   git branch -D test
   git gc
   du -s .    # returns 30916

Here I had expected ~20000 since the branch uses ~10000.

My config is

[gc]
         reflogExpire = 0
         reflogExpireUnreachable = 0
         rerereresolved = 0
         rerereunresolved = 0
         packrefs = 1

I also tried 'git-pack-refs --all' or 'git-pack-refs --prune' but to no 
avail.

What am I doing wrong?

Thanks for any hints.

Regards

Guido

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 17:45 git gc & deleted branches Guido Ostkamp
@ 2008-05-08 18:39 ` Jeff King
  2008-05-08 18:55   ` Guido Ostkamp
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2008-05-08 18:39 UTC (permalink / raw)
  To: Guido Ostkamp; +Cc: git

On Thu, May 08, 2008 at 07:45:31PM +0200, Guido Ostkamp wrote:

> [gc]
>         reflogExpire = 0
>         reflogExpireUnreachable = 0
>         rerereresolved = 0
>         rerereunresolved = 0
>         packrefs = 1

git-gc uses a "safe" pruning mode, where it only prunes unreferenced
objects that are older than a certain period (this makes it safe to run
git-gc, even if other processes are creating objects at the same time).

So try

[gc]
        pruneExpire = now

Alternatively, you can just run 'git prune' manually instead of 'git
gc'.

> I also tried 'git-pack-refs --all' or 'git-pack-refs --prune' but to no  
> avail.

Those won't help at all; they are purely about moving refs from
individual files into the 'packed-refs' file.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 18:39 ` Jeff King
@ 2008-05-08 18:55   ` Guido Ostkamp
  2008-05-08 20:07     ` Brandon Casey
  2008-05-08 20:51     ` Jeff King
  0 siblings, 2 replies; 50+ messages in thread
From: Guido Ostkamp @ 2008-05-08 18:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Thu, 8 May 2008, Jeff King wrote:
> On Thu, May 08, 2008 at 07:45:31PM +0200, Guido Ostkamp wrote:
>
>> [gc]
>>         reflogExpire = 0
>>         reflogExpireUnreachable = 0
>>         rerereresolved = 0
>>         rerereunresolved = 0
>>         packrefs = 1
>
> git-gc uses a "safe" pruning mode, where it only prunes unreferenced
> objects that are older than a certain period (this makes it safe to run
> git-gc, even if other processes are creating objects at the same time).
>
> So try
>
> [gc]
>        pruneExpire = now
>
> Alternatively, you can just run 'git prune' manually instead of 'git
> gc'.

Jeff, I tried it, but it has no effect (see below). There is only the 
master branch left, and only one commit therein, still it uses the space 
former occupied by the branch. I'm using git version 1.5.5.1.147.g867f.

Any further ideas?


$ git config -l
core.repositoryformatversion=0
core.filemode=true
core.bare=false
core.logallrefupdates=true
gc.reflogexpire=0
gc.reflogexpireunreachable=0
gc.rerereresolved=0
gc.rerereunresolved=0
gc.packrefs=1
gc.pruneexpire=now

$ git gc
Counting objects: 6, done.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), done.
Total 6 (delta 0), reused 6 (delta 0)

$ git prune

$ git branch
* master

$ du -s .
30820   .

$ git log
commit 9717437cdcb2a4457f28f41db5f6fad9ca55b54e
Author: Testuser <testuser@bianca.dialin.t-online.de>
Date:   Thu May 8 19:40:06 2008 +0200

     initial checkin

$ ls -l
total 10240
-rw-r--r-- 1 testuser users 10485760 May  8 19:40 file

Regards

Guido

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 18:55   ` Guido Ostkamp
@ 2008-05-08 20:07     ` Brandon Casey
  2008-05-08 20:52       ` Guido Ostkamp
  2008-05-08 20:56       ` Jeff King
  2008-05-08 20:51     ` Jeff King
  1 sibling, 2 replies; 50+ messages in thread
From: Brandon Casey @ 2008-05-08 20:07 UTC (permalink / raw)
  To: Guido Ostkamp; +Cc: Jeff King, git

Guido Ostkamp wrote:
> On Thu, 8 May 2008, Jeff King wrote:
>> On Thu, May 08, 2008 at 07:45:31PM +0200, Guido Ostkamp wrote:
>>
>>> [gc]
>>>         reflogExpire = 0
>>>         reflogExpireUnreachable = 0
>>>         rerereresolved = 0
>>>         rerereunresolved = 0
>>>         packrefs = 1
>>
>> git-gc uses a "safe" pruning mode, where it only prunes unreferenced
>> objects that are older than a certain period (this makes it safe to run
>> git-gc, even if other processes are creating objects at the same time).
>>
>> So try
>>
>> [gc]
>>        pruneExpire = now
>>
>> Alternatively, you can just run 'git prune' manually instead of 'git
>> gc'.
> 
> Jeff, I tried it, but it has no effect (see below). There is only the
> master branch left, and only one commit therein, still it uses the space
> former occupied by the branch. I'm using git version 1.5.5.1.147.g867f.
> 
> Any further ideas?


Possibly that object got packed? git-prune only removes loose objects.
Try 'git gc --prune' which will call git-repack with the -a option.

btw, this is _really_ a non-issue. It seems to keep coming up on the list.

Just know that each one of the config options that you set to zero, including
the one Jeff suggested setting to "now", is a safety mechanism that is there
to ensure that you never ever lose data and that mistakes are recoverable.

And be assured that the objects referenced by a deleted branch will be removed
from the repository eventually as long as 'git gc --prune' is run periodically.

-brandon

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 18:55   ` Guido Ostkamp
  2008-05-08 20:07     ` Brandon Casey
@ 2008-05-08 20:51     ` Jeff King
  1 sibling, 0 replies; 50+ messages in thread
From: Jeff King @ 2008-05-08 20:51 UTC (permalink / raw)
  To: Guido Ostkamp; +Cc: git

On Thu, May 08, 2008 at 08:55:50PM +0200, Guido Ostkamp wrote:

> Jeff, I tried it, but it has no effect (see below). There is only the  
> master branch left, and only one commit therein, still it uses the space  
> former occupied by the branch. I'm using git version 1.5.5.1.147.g867f.

It worked fine for me; it's possible, as Brandon mentioned, that it is
in a pack already, and only a "repack -a" would get rid of it. FWIW, my
steps were:

  # ...same as you for repo and branch creation
  git branch -D test
  git config gc.reflogexpire 0
  git config gc.reflogexpireunreachable 0
  git config gc.pruneexpire now
  git gc
  du -s .git ;# shows 10396

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 20:07     ` Brandon Casey
@ 2008-05-08 20:52       ` Guido Ostkamp
  2008-05-08 21:01         ` Jeff King
  2008-05-08 20:56       ` Jeff King
  1 sibling, 1 reply; 50+ messages in thread
From: Guido Ostkamp @ 2008-05-08 20:52 UTC (permalink / raw)
  To: Brandon Casey; +Cc: Jeff King, git

On Thu, 8 May 2008, Brandon Casey wrote:
> Possibly that object got packed? git-prune only removes loose objects. 
> Try 'git gc --prune' which will call git-repack with the -a option.

Thanks, Brandon - that did the trick :-)

> Just know that each one of the config options that you set to zero, 
> including the one Jeff suggested setting to "now", is a safety mechanism 
> that is there to ensure that you never ever lose data and that mistakes 
> are recoverable.

I am aware of this. However, at work I am unfortunately bound to a very 
restrictive filesystem quota on central development servers, so every 
single byte counts in (our official versioning control system is ClearCase 
where less space is required due to working tree and history being 
supplied through virtual filesystems).

> And be assured that the objects referenced by a deleted branch will be 
> removed from the repository eventually as long as 'git gc --prune' is 
> run periodically.

Ok. I did not know about the 'prune' option yet as it neither mentioned in 
the "Git Tutorial" nor "Everyday Git", there only 'git gc' is used with no 
options.

Regards

Guido

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 20:07     ` Brandon Casey
  2008-05-08 20:52       ` Guido Ostkamp
@ 2008-05-08 20:56       ` Jeff King
  1 sibling, 0 replies; 50+ messages in thread
From: Jeff King @ 2008-05-08 20:56 UTC (permalink / raw)
  To: Brandon Casey; +Cc: Guido Ostkamp, git

On Thu, May 08, 2008 at 03:07:53PM -0500, Brandon Casey wrote:

> btw, this is _really_ a non-issue. It seems to keep coming up on the list.
> 
> Just know that each one of the config options that you set to zero, including
> the one Jeff suggested setting to "now", is a safety mechanism that is there
> to ensure that you never ever lose data and that mistakes are recoverable.

Yes, I want to chime in since I have been giving advice in such threads:
Please don't construe my help as any sort of endorsement of this
behavior. Git tries hard not to lose your data, and it is almost always
a bad idea to try to override these safety checks unless you really know
what you are doing.

And even then, try to consider balancing a bit of freed disk space (and
generally _no_ performance gain, because git is very good about not
looking at objects that aren't necessary to the current operation)
versus thinking "oops, I wish I still had that data" in a few days.

I can think offhand of only one time when it was truly useful for me to
prune aggressively, and it was a very special case: a pathologically
large repo for which I was doing a one-shot conversion from another
format (and I wanted to prune failed attempts).

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 20:52       ` Guido Ostkamp
@ 2008-05-08 21:01         ` Jeff King
  2008-05-08 21:15           ` Nicolas Pitre
  2008-05-08 21:33           ` Guido Ostkamp
  0 siblings, 2 replies; 50+ messages in thread
From: Jeff King @ 2008-05-08 21:01 UTC (permalink / raw)
  To: Guido Ostkamp; +Cc: Brandon Casey, git

On Thu, May 08, 2008 at 10:52:19PM +0200, Guido Ostkamp wrote:

>> And be assured that the objects referenced by a deleted branch will be  
>> removed from the repository eventually as long as 'git gc --prune' is  
>> run periodically.
>
> Ok. I did not know about the 'prune' option yet as it neither mentioned in 
> the "Git Tutorial" nor "Everyday Git", there only 'git gc' is used with no 
> options.

It is deprecated; see 25ee9731.

According to that commit message, prune is now a no-op. However, it
looks like it is still used for trigger a "repack -a" rather than
"repack -A". I don't know if it is worth making that behavior available
through some more sane command line option (I would think people who
really know that they want "repack -a" would just call it).

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 21:01         ` Jeff King
@ 2008-05-08 21:15           ` Nicolas Pitre
  2008-05-08 21:17             ` Jeff King
  2008-05-08 21:33           ` Guido Ostkamp
  1 sibling, 1 reply; 50+ messages in thread
From: Nicolas Pitre @ 2008-05-08 21:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Guido Ostkamp, Brandon Casey, git

On Thu, 8 May 2008, Jeff King wrote:

> On Thu, May 08, 2008 at 10:52:19PM +0200, Guido Ostkamp wrote:
> 
> >> And be assured that the objects referenced by a deleted branch will be  
> >> removed from the repository eventually as long as 'git gc --prune' is  
> >> run periodically.
> >
> > Ok. I did not know about the 'prune' option yet as it neither mentioned in 
> > the "Git Tutorial" nor "Everyday Git", there only 'git gc' is used with no 
> > options.
> 
> It is deprecated; see 25ee9731.
> 
> According to that commit message, prune is now a no-op. However, it
> looks like it is still used for trigger a "repack -a" rather than
> "repack -A". I don't know if it is worth making that behavior available
> through some more sane command line option (I would think people who
> really know that they want "repack -a" would just call it).

Well, actually this is a problem.

I think it is a good thing to deprecate gc --prune.  but if that means 
that repack -a is never used then unreferenced and expired objects will 
never be pruned if they're packed if one is always using 'git gc' as we 
are advocating.


Nicolas

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 21:15           ` Nicolas Pitre
@ 2008-05-08 21:17             ` Jeff King
  2008-05-08 21:23               ` Brandon Casey
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2008-05-08 21:17 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Guido Ostkamp, Brandon Casey, git

On Thu, May 08, 2008 at 05:15:34PM -0400, Nicolas Pitre wrote:

> > According to that commit message, prune is now a no-op. However, it
> > looks like it is still used for trigger a "repack -a" rather than
> > "repack -A". I don't know if it is worth making that behavior available
> 
> Well, actually this is a problem.
> 
> I think it is a good thing to deprecate gc --prune.  but if that means 
> that repack -a is never used then unreferenced and expired objects will 
> never be pruned if they're packed if one is always using 'git gc' as we 
> are advocating.

I thought that -A would eventually put them all into a single pack,
killing off the old packs.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 21:17             ` Jeff King
@ 2008-05-08 21:23               ` Brandon Casey
  2008-05-08 21:31                 ` Jeff King
  0 siblings, 1 reply; 50+ messages in thread
From: Brandon Casey @ 2008-05-08 21:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Nicolas Pitre, Guido Ostkamp, git

Jeff King wrote:
> On Thu, May 08, 2008 at 05:15:34PM -0400, Nicolas Pitre wrote:
> 
>>> According to that commit message, prune is now a no-op. However, it
>>> looks like it is still used for trigger a "repack -a" rather than
>>> "repack -A". I don't know if it is worth making that behavior available
>> Well, actually this is a problem.
>>
>> I think it is a good thing to deprecate gc --prune.  but if that means 
>> that repack -a is never used then unreferenced and expired objects will 
>> never be pruned if they're packed if one is always using 'git gc' as we 
>> are advocating.
> 
> I thought that -A would eventually put them all into a single pack,
> killing off the old packs.

'-a' puts everything in a single pack and kills off old packs. Anything that
was unreachable is not repacked in the new pack.

'-A' does the same thing but it also repacks the unreachable objects that were
previously packed.

So if something gets packed that subsequently becomes unreachable it will never
be removed unless 'repack -a' is used.

Possibly --keep-unreachable should instead unpack the unreachable items which would
allow them to eventually be pruned based on pruneExpire. Then we could indeed
get rid of the --prune option to git-gc.

-brandon

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 21:23               ` Brandon Casey
@ 2008-05-08 21:31                 ` Jeff King
  2008-05-08 21:40                   ` Brandon Casey
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2008-05-08 21:31 UTC (permalink / raw)
  To: Brandon Casey; +Cc: Nicolas Pitre, Guido Ostkamp, git

On Thu, May 08, 2008 at 04:23:53PM -0500, Brandon Casey wrote:

> > I thought that -A would eventually put them all into a single pack,
> > killing off the old packs.
> 
> '-a' puts everything in a single pack and kills off old packs. Anything that
> was unreachable is not repacked in the new pack.
> 
> '-A' does the same thing but it also repacks the unreachable objects that were
> previously packed.

Ah, indeed. I hadn't looked closely at the -A behavior before. So yes,
we are never killing off prunable packed objects. Probably we could use
the same solution as "git prune --expire"; perhaps a
"--keep-unreachable=2.weeks.ago"?

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 21:01         ` Jeff King
  2008-05-08 21:15           ` Nicolas Pitre
@ 2008-05-08 21:33           ` Guido Ostkamp
  1 sibling, 0 replies; 50+ messages in thread
From: Guido Ostkamp @ 2008-05-08 21:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Brandon Casey, git

On Thu, 8 May 2008, Jeff King wrote:
> According to that commit message, prune is now a no-op. However, it 
> looks like it is still used for trigger a "repack -a" rather than 
> "repack -A".

I tried to look at this but found this option '-A' to be undocumented in 
the manpage (git/Documentation/git-repack.txt and what is generated from 
it).

Regards

Guido

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 21:31                 ` Jeff King
@ 2008-05-08 21:40                   ` Brandon Casey
  2008-05-08 21:44                     ` Jeff King
  0 siblings, 1 reply; 50+ messages in thread
From: Brandon Casey @ 2008-05-08 21:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Nicolas Pitre, Guido Ostkamp, git

Jeff King wrote:
> On Thu, May 08, 2008 at 04:23:53PM -0500, Brandon Casey wrote:
> 
>>> I thought that -A would eventually put them all into a single pack,
>>> killing off the old packs.
>> '-a' puts everything in a single pack and kills off old packs. Anything that
>> was unreachable is not repacked in the new pack.
>>
>> '-A' does the same thing but it also repacks the unreachable objects that were
>> previously packed.
> 
> Ah, indeed. I hadn't looked closely at the -A behavior before. So yes,
> we are never killing off prunable packed objects. Probably we could use
> the same solution as "git prune --expire"; perhaps a
> "--keep-unreachable=2.weeks.ago"?

The 'prune --expire' behavior is based on object mtime (i.e. file modification time).
That is lost once something is packed right?

I was thinking that either repack or pack-objects could be modified to unpack those
unreachable objects and leave them loose, and also give them the timestamp of the
pack file they came from. Then the --expire behavior of git-prune could work normally
and remove them. This seems like it would work nicely since prune follows repack in
git-gc.

-brandon

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 21:40                   ` Brandon Casey
@ 2008-05-08 21:44                     ` Jeff King
  2008-05-08 21:53                       ` Brandon Casey
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2008-05-08 21:44 UTC (permalink / raw)
  To: Brandon Casey; +Cc: Nicolas Pitre, Guido Ostkamp, git

On Thu, May 08, 2008 at 04:40:20PM -0500, Brandon Casey wrote:

> The 'prune --expire' behavior is based on object mtime (i.e. file
> modification time).  That is lost once something is packed right?

Yes. You would have to use the pack mtime. But of course you would have
to actually _leave_ them in a pack, or they would just keep getting
added to the new pack.

> I was thinking that either repack or pack-objects could be modified to
> unpack those unreachable objects and leave them loose, and also give
> them the timestamp of the pack file they came from. Then the --expire
> behavior of git-prune could work normally and remove them. This seems
> like it would work nicely since prune follows repack in git-gc.

That is sensible, I think.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 21:44                     ` Jeff King
@ 2008-05-08 21:53                       ` Brandon Casey
  2008-05-08 22:48                         ` Jeff King
  0 siblings, 1 reply; 50+ messages in thread
From: Brandon Casey @ 2008-05-08 21:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Nicolas Pitre, Guido Ostkamp, git

Jeff King wrote:
> On Thu, May 08, 2008 at 04:40:20PM -0500, Brandon Casey wrote:
> 
>> The 'prune --expire' behavior is based on object mtime (i.e. file
>> modification time).  That is lost once something is packed right?
> 
> Yes. You would have to use the pack mtime. But of course you would have
> to actually _leave_ them in a pack, or they would just keep getting
> added to the new pack.

I had the impression that unreachable objects would not be packed. Maybe it
was more of an assumption.

-brandon

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 21:53                       ` Brandon Casey
@ 2008-05-08 22:48                         ` Jeff King
  2008-05-09  1:41                           ` Brandon Casey
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2008-05-08 22:48 UTC (permalink / raw)
  To: Brandon Casey; +Cc: Nicolas Pitre, Guido Ostkamp, git

On Thu, May 08, 2008 at 04:53:20PM -0500, Brandon Casey wrote:

> > Yes. You would have to use the pack mtime. But of course you would have
> > to actually _leave_ them in a pack, or they would just keep getting
> > added to the new pack.
> 
> I had the impression that unreachable objects would not be packed. Maybe it
> was more of an assumption.

Look in builtin-pack-objects.c:1981-1982. We basically just say "if it's
in a pack now, then it should go into the new pack."

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-08 22:48                         ` Jeff King
@ 2008-05-09  1:41                           ` Brandon Casey
  2008-05-09  3:21                             ` Junio C Hamano
  2008-05-09  4:19                             ` git gc & deleted branches Jeff King
  0 siblings, 2 replies; 50+ messages in thread
From: Brandon Casey @ 2008-05-09  1:41 UTC (permalink / raw)
  To: git

Jeff King <peff <at> peff.net> writes:

> 
> On Thu, May 08, 2008 at 04:53:20PM -0500, Brandon Casey wrote:
> 
> > > Yes. You would have to use the pack mtime. But of course you would have
> > > to actually _leave_ them in a pack, or they would just keep getting
> > > added to the new pack.
> > 
> > I had the impression that unreachable objects would not be packed. Maybe it
> > was more of an assumption.
> 
> Look in builtin-pack-objects.c:1981-1982. We basically just say "if it's
> in a pack now, then it should go into the new pack."
> 
> -Peff
> 


Here's what I was thinking (posted using gmane):

diff --git a/git-repack.sh b/git-repack.sh
index e18eb3f..064c331 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -30,7 +30,7 @@ do
        -n)     no_update_info=t ;;
        -a)     all_into_one=t ;;
        -A)     all_into_one=t
-               keep_unreachable=--keep-unreachable ;;
+               keep_unreachable=t ;;
        -d)     remove_redundant=t ;;
        -q)     quiet=-q ;;
        -f)     no_reuse=--no-reuse-object ;;
@@ -78,9 +78,6 @@ case ",$all_into_one," in
        if test -z "$args"
        then
                args='--unpacked --incremental'
-       elif test -n "$keep_unreachable"
-       then
-               args="$args $keep_unreachable"
        fi
        ;;
 esac
@@ -116,7 +113,15 @@ for name in $names ; do
                echo >&2 "old-pack-$name.{pack,idx} in $PACKDIR."
                exit 1
        }
-       rm -f "$PACKDIR/old-pack-$name.pack" "$PACKDIR/old-pack-$name.idx"
+       rm -f "$PACKDIR/old-pack-$name.idx"
+       test -z "$keep_unreachable" ||
+         ! test -f "$PACKDIR/old-pack-$name.pack" ||
+         git unpack-objects < "$PACKDIR/old-pack-$name.pack" || {
+               echo >&2 "Failed unpacking unreachable objects from old pack"
+               echo >&2 "saved as old-pack-$name.pack in $PACKDIR."
+               exit 1
+       }
+       rm -f "$PACKDIR/old-pack-$name.pack"
 done
 
 if test "$remove_redundant" = t
@@ -130,7 +135,18 @@ then
                  do
                        case " $fullbases " in
                        *" $e "*) ;;
-                       *)      rm -f "$e.pack" "$e.idx" "$e.keep" ;;
+                       *)
+                               rm -f "$e.idx" "$e.keep"
+                               if test -n "$keep_unreachable" &&
+                                  test -f "$e.pack"
+                               then
+                                       git unpack-objects < "$e.pack" || {
+                                               echo >&2 "Fail AVOID GMANE WRAP"
+                                               exit 1
+                                       }
+                               fi
+                               rm -f "$e.pack"
+                       ;;
                        esac
                  done
                )


Is the first invocation of unpack-objects necessary? pack-objects has created
a pack which hashes to the same name of a pack we already have, and we replace
the original with the new one. Is that what is happening? They will be identical
right?

Of course this won't set the timestamp on the created objects based on the
timestamp of the pack file, but this was easy. Setting the timestamp would be
proper, but what's another two weeks. Besides, for those users not manually
running git-gc, this code path won't even be executed until there are enough
pack files for git-gc to add -A to the repack options.

Then, for git-gc it should be enough to just always use -A with repack when
manually running it. Then --prune can be deprecated.

-brandon

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-09  1:41                           ` Brandon Casey
@ 2008-05-09  3:21                             ` Junio C Hamano
       [not found]                               ` <ee63ef30805082105w7f04a2d1y65a4618aeb787cac@mail.gmail.com>
                                                 ` (4 more replies)
  2008-05-09  4:19                             ` git gc & deleted branches Jeff King
  1 sibling, 5 replies; 50+ messages in thread
From: Junio C Hamano @ 2008-05-09  3:21 UTC (permalink / raw)
  To: Brandon Casey; +Cc: git

Brandon Casey <drafnel@gmail.com> writes:

> @@ -116,7 +113,15 @@ for name in $names ; do
>                 echo >&2 "old-pack-$name.{pack,idx} in $PACKDIR."
>                 exit 1
>         }
> -       rm -f "$PACKDIR/old-pack-$name.pack" "$PACKDIR/old-pack-$name.idx"
> +       rm -f "$PACKDIR/old-pack-$name.idx"
> +       test -z "$keep_unreachable" ||
> +         ! test -f "$PACKDIR/old-pack-$name.pack" ||
> +         git unpack-objects < "$PACKDIR/old-pack-$name.pack" || {
> +               echo >&2 "Failed unpacking unreachable objects from old pack"
> +               echo >&2 "saved as old-pack-$name.pack in $PACKDIR."
> +               exit 1
> +       }

Neat trick.  Unreachable objects that are only in this pack will get
current timestamp and gets a new lease of life for two weeks and then will
disappear.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-09  1:41                           ` Brandon Casey
  2008-05-09  3:21                             ` Junio C Hamano
@ 2008-05-09  4:19                             ` Jeff King
  2008-05-09 15:00                               ` Geert Bosch
  1 sibling, 1 reply; 50+ messages in thread
From: Jeff King @ 2008-05-09  4:19 UTC (permalink / raw)
  To: Brandon Casey; +Cc: git

On Fri, May 09, 2008 at 01:41:30AM +0000, Brandon Casey wrote:

> Here's what I was thinking (posted using gmane):
> 
> diff --git a/git-repack.sh b/git-repack.sh
> index e18eb3f..064c331 100755
> --- a/git-repack.sh
> +++ b/git-repack.sh

I like it. It makes an easy rule to say "packed objects _never_ get
pruned, they only get demoted to loose objects." And then of course
we have sane rules for pruning loose objects.

> -       rm -f "$PACKDIR/old-pack-$name.pack" "$PACKDIR/old-pack-$name.idx"
> +       rm -f "$PACKDIR/old-pack-$name.idx"
> +       test -z "$keep_unreachable" ||
> +         ! test -f "$PACKDIR/old-pack-$name.pack" ||
> +         git unpack-objects < "$PACKDIR/old-pack-$name.pack" || {
> +               echo >&2 "Failed unpacking unreachable objects from old pack"
> +               echo >&2 "saved as old-pack-$name.pack in $PACKDIR."
> +               exit 1
> +       }
> +       rm -f "$PACKDIR/old-pack-$name.pack"
> [...]
> 
> Is the first invocation of unpack-objects necessary? pack-objects has created
> a pack which hashes to the same name of a pack we already have, and we replace
> the original with the new one. Is that what is happening? They will be
> identical right?

Yeah, that's what it looks like to me (that the first unpack is
unnecessary, because we will just be putting the new pack into place
that has all the same objects). AIUI, two packs with identical hashes
must contain the exact same objects.

> Of course this won't set the timestamp on the created objects based on the
> timestamp of the pack file, but this was easy. Setting the timestamp would be
> proper, but what's another two weeks. Besides, for those users not manually
> running git-gc, this code path won't even be executed until there are enough
> pack files for git-gc to add -A to the repack options.

I think the extra two weeks is fine.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-09  4:19                             ` git gc & deleted branches Jeff King
@ 2008-05-09 15:00                               ` Geert Bosch
  2008-05-09 15:14                                 ` Brandon Casey
  0 siblings, 1 reply; 50+ messages in thread
From: Geert Bosch @ 2008-05-09 15:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Brandon Casey, git


On May 9, 2008, at 00:19, Jeff King wrote:

> I like it. It makes an easy rule to say "packed objects _never_ get
> pruned, they only get demoted to loose objects." And then of course
> we have sane rules for pruning loose objects.

Isn't there an issue with the "git gc" triggering because there
may be too many loose unreferenced objects?
Still, I do like the approach.

Maybe unreferenced objects and old refs should go to a .git/lost+found
directory and be expired from there. This has a couple of benefits:

   -  Easy to manually inspect or blow away any crud
   -  One git-gc run can make one pack in lost+found,
      avoiding huge numbers of loose objects (and massive disk use)
      when trying to do a large cleanup (to possibly reclaim disk space)
   -  Objects will not be accessible by ordinary git commands for a  
while,
      before they are really removed, avoiding surprises

Only some tools would look in the lost+found to restore stuff.

   -Geert

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-09 15:00                               ` Geert Bosch
@ 2008-05-09 15:14                                 ` Brandon Casey
  2008-05-09 15:53                                   ` Jeff King
  2008-05-09 16:12                                   ` Nicolas Pitre
  0 siblings, 2 replies; 50+ messages in thread
From: Brandon Casey @ 2008-05-09 15:14 UTC (permalink / raw)
  To: Geert Bosch; +Cc: Jeff King, git

Geert Bosch wrote:
> 
> On May 9, 2008, at 00:19, Jeff King wrote:
> 
>> I like it. It makes an easy rule to say "packed objects _never_ get
>> pruned, they only get demoted to loose objects." And then of course
>> we have sane rules for pruning loose objects.
> 
> Isn't there an issue with the "git gc" triggering because there
> may be too many loose unreferenced objects?
> Still, I do like the approach.

This would be an argument for going the extra mile and having the loose
objects adopt the timestamp of their pack file. In the normal case they
would probably be pruned immediately during the same git-gc run.

> Maybe unreferenced objects and old refs should go to a .git/lost+found
> directory and be expired from there. This has a couple of benefits:

>   -  Objects will not be accessible by ordinary git commands for a while,
>      before they are really removed, avoiding surprises

Unreferenced objects are sometimes used by other repositories which have
this repository listed as an alternate. So it may not be a good idea to
make the unreferenced objects inaccessible.

-brandon

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-09 15:14                                 ` Brandon Casey
@ 2008-05-09 15:53                                   ` Jeff King
  2008-05-09 15:56                                     ` Brandon Casey
  2008-05-09 16:12                                   ` Nicolas Pitre
  1 sibling, 1 reply; 50+ messages in thread
From: Jeff King @ 2008-05-09 15:53 UTC (permalink / raw)
  To: Brandon Casey; +Cc: Geert Bosch, git

On Fri, May 09, 2008 at 10:14:12AM -0500, Brandon Casey wrote:

> >   -  Objects will not be accessible by ordinary git commands for a while,
> >      before they are really removed, avoiding surprises
> 
> Unreferenced objects are sometimes used by other repositories which have
> this repository listed as an alternate. So it may not be a good idea to
> make the unreferenced objects inaccessible.

But that is precisely what we're going to do, but in two weeks. Isn't it
better to have the dependent repo fail while the change is recoverable?

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-09 15:53                                   ` Jeff King
@ 2008-05-09 15:56                                     ` Brandon Casey
  0 siblings, 0 replies; 50+ messages in thread
From: Brandon Casey @ 2008-05-09 15:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Geert Bosch, git

Jeff King wrote:
> On Fri, May 09, 2008 at 10:14:12AM -0500, Brandon Casey wrote:
> 
>>>   -  Objects will not be accessible by ordinary git commands for a while,
>>>      before they are really removed, avoiding surprises
>> Unreferenced objects are sometimes used by other repositories which have
>> this repository listed as an alternate. So it may not be a good idea to
>> make the unreferenced objects inaccessible.
> 
> But that is precisely what we're going to do, but in two weeks. Isn't it
> better to have the dependent repo fail while the change is recoverable?

good point.

-b

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-09 15:14                                 ` Brandon Casey
  2008-05-09 15:53                                   ` Jeff King
@ 2008-05-09 16:12                                   ` Nicolas Pitre
  2008-05-09 16:54                                     ` Brandon Casey
  2008-05-09 22:33                                     ` Junio C Hamano
  1 sibling, 2 replies; 50+ messages in thread
From: Nicolas Pitre @ 2008-05-09 16:12 UTC (permalink / raw)
  To: Brandon Casey; +Cc: Geert Bosch, Jeff King, git

On Fri, 9 May 2008, Brandon Casey wrote:

> Geert Bosch wrote:
> > 
> > On May 9, 2008, at 00:19, Jeff King wrote:
> > 
> >> I like it. It makes an easy rule to say "packed objects _never_ get
> >> pruned, they only get demoted to loose objects." And then of course
> >> we have sane rules for pruning loose objects.
> > 
> > Isn't there an issue with the "git gc" triggering because there
> > may be too many loose unreferenced objects?
> > Still, I do like the approach.
> 
> This would be an argument for going the extra mile and having the loose
> objects adopt the timestamp of their pack file. In the normal case they
> would probably be pruned immediately during the same git-gc run.

Well, not necessarily.  If you created a large branch yesterday and you 
are deleting it today, then if you repacked in between means that those 
loose objects won't be more than one day old.  Yet there could be enough 
of them to trigger auto gc.  But that auto gc won't pack those objects 
since they are unreferenced.  Hence auto gc will trigger all the time 
without making any progress.

> > Maybe unreferenced objects and old refs should go to a .git/lost+found
> > directory and be expired from there. This has a couple of benefits:
> 
> >   -  Objects will not be accessible by ordinary git commands for a while,
> >      before they are really removed, avoiding surprises
> 
> Unreferenced objects are sometimes used by other repositories which have
> this repository listed as an alternate. So it may not be a good idea to
> make the unreferenced objects inaccessible.

Nah.  If this is really the case then you shouldn't be running gc at all 
in the first place.


Nicolas

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-09 16:12                                   ` Nicolas Pitre
@ 2008-05-09 16:54                                     ` Brandon Casey
  2008-05-09 22:33                                     ` Junio C Hamano
  1 sibling, 0 replies; 50+ messages in thread
From: Brandon Casey @ 2008-05-09 16:54 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Geert Bosch, Jeff King, git

Nicolas Pitre wrote:
> On Fri, 9 May 2008, Brandon Casey wrote:
> 
>> Geert Bosch wrote:
>>> On May 9, 2008, at 00:19, Jeff King wrote:
>>>
>>>> I like it. It makes an easy rule to say "packed objects _never_ get
>>>> pruned, they only get demoted to loose objects." And then of course
>>>> we have sane rules for pruning loose objects.
>>> Isn't there an issue with the "git gc" triggering because there
>>> may be too many loose unreferenced objects?
>>> Still, I do like the approach.
>> This would be an argument for going the extra mile and having the loose
>> objects adopt the timestamp of their pack file. In the normal case they
>> would probably be pruned immediately during the same git-gc run.
> 
> Well, not necessarily.  If you created a large branch yesterday and you 
> are deleting it today, then if you repacked in between means that those 
> loose objects won't be more than one day old.  Yet there could be enough 
> of them to trigger auto gc.  But that auto gc won't pack those objects 
> since they are unreferenced.  Hence auto gc will trigger all the time 
> without making any progress.

That's true, but the intermediate repack is not the cause here. You'd be
in the same situation if a large branch was created yesterday and then
deleted today even if packing had never occurred.

I do see your point, but you should have said a large branch created a month
ago, deleted today, but repacked yesterday. :)

-brandon

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-09 16:12                                   ` Nicolas Pitre
  2008-05-09 16:54                                     ` Brandon Casey
@ 2008-05-09 22:33                                     ` Junio C Hamano
  2008-05-09 23:09                                       ` [PATCH] Updating documentation to match Brandon Casey's proposed git-repack patch Chris Frey
  2008-05-10  0:07                                       ` git gc & deleted branches Jeremy Maitin-Shepard
  1 sibling, 2 replies; 50+ messages in thread
From: Junio C Hamano @ 2008-05-09 22:33 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Brandon Casey, Geert Bosch, Jeff King, git

Nicolas Pitre <nico@cam.org> writes:

> On Fri, 9 May 2008, Brandon Casey wrote:
>
>> Unreferenced objects are sometimes used by other repositories which have
>> this repository listed as an alternate. So it may not be a good idea to
>> make the unreferenced objects inaccessible.
>
> Nah.  If this is really the case then you shouldn't be running gc at all 
> in the first place.

True.

I think the true motivation behind --keep-unreachable is not about the
shared object store (aka "alternates") but about races between gc and
push (or fetch).  Before push (or fetch) finishes and updates refs, the
new objects they create would be dangling _and_ the objects these dangling
objects refer to may be packed but unreferenced.  Repacking unreferenced
packed objects was a way to avoid losing them.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH] Updating documentation to match Brandon Casey's proposed git-repack patch.
  2008-05-09 22:33                                     ` Junio C Hamano
@ 2008-05-09 23:09                                       ` Chris Frey
  2008-05-10  0:07                                       ` git gc & deleted branches Jeremy Maitin-Shepard
  1 sibling, 0 replies; 50+ messages in thread
From: Chris Frey @ 2008-05-09 23:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Brandon Casey, Geert Bosch, Jeff King, git

On Fri, May 09, 2008 at 03:33:41PM -0700, Junio C Hamano wrote:
> I think the true motivation behind --keep-unreachable is not about the
> shared object store (aka "alternates") but about races between gc and
> push (or fetch).  Before push (or fetch) finishes and updates refs, the
> new objects they create would be dangling _and_ the objects these dangling
> objects refer to may be packed but unreferenced.  Repacking unreferenced
> packed objects was a way to avoid losing them.

This is what the log history seems to indicate:

	git log -p --grep=keep-unreach

So pack-objects --keep-unreachable was implemented in order to add repack -A,
which now doesn't need --keep-unreachable, and can become obsolete.

Which is just as well, since --keep-unreachable never made it to the
man pages. :-)

If I understand things correctly, there is no user-friendly way to add
loose, unreachable objects to a pack.  This whole architecture was just
to prevent a repack from silently deleting things.

If this is right, the patch below updates the docs.

- Chris


>From 443b1201d54f0b7197d18779ce934823e9897b36 Mon Sep 17 00:00:00 2001
From: Chris Frey <cdfrey@foursquare.net>
Date: Fri, 9 May 2008 19:08:26 -0400
Subject: [PATCH] Updating documentation to match Brandon Casey's proposed git-repack patch.

This patch clarifies the git-prune man page, documenting that it only
prunes unpacked objects.  git-repack is documented according to
the new git-repack -A behaviour, which does not depend on
git-pack-objects --keep-unreachable anymore.

Signed-off-by: Chris Frey <cdfrey@foursquare.net>
---
 Documentation/git-prune.txt  |    5 ++++-
 Documentation/git-repack.txt |   14 +++++++++++++-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
index f92bb8c..3178bc4 100644
--- a/Documentation/git-prune.txt
+++ b/Documentation/git-prune.txt
@@ -18,12 +18,15 @@ git-prune. See the section "NOTES", below.
 
 This runs `git-fsck --unreachable` using all the refs
 available in `$GIT_DIR/refs`, optionally with additional set of
-objects specified on the command line, and prunes all
+objects specified on the command line, and prunes all unpacked
 objects unreachable from any of these head objects from the object database.
 In addition, it
 prunes the unpacked objects that are also found in packs by
 running `git prune-packed`.
 
+Note that unreachable, packed objects will remain.  If this is
+not desired, see linkgit:git-repack[1].
+
 OPTIONS
 -------
 
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 3d95749..906d3c7 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -8,7 +8,7 @@ git-repack - Pack unpacked objects in a repository
 
 SYNOPSIS
 --------
-'git-repack' [-a] [-d] [-f] [-l] [-n] [-q] [--window=N] [--depth=N]
+'git-repack' [-a] [-A] [-d] [-f] [-l] [-n] [-q] [--window=N] [--depth=N]
 
 DESCRIPTION
 -----------
@@ -37,6 +37,18 @@ OPTIONS
 	leaves behind, but `git fsck --full` shows as
 	dangling.
 
+-A::
+	Same as `-a`, but any unreachable objects in a previous
+	pack become loose, unpacked objects, instead of being
+	left in the old pack.  Unreachable objects are never
+	intentionally added to a pack, even when repacking.
+	When used with '-d', this option
+	prevents unreachable objects from being immediately
+	deleted by way of being left in the old pack and then
+	removed.  Instead, the loose unreachable objects
+	will be pruned according to normal expiry rules
+	with the next linkgit:git-gc[1].
+
 -d::
 	After packing, if the newly created packs make some
 	existing packs redundant, remove the redundant packs.
-- 
1.5.4.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-09 22:33                                     ` Junio C Hamano
  2008-05-09 23:09                                       ` [PATCH] Updating documentation to match Brandon Casey's proposed git-repack patch Chris Frey
@ 2008-05-10  0:07                                       ` Jeremy Maitin-Shepard
  2008-05-10  0:20                                         ` Shawn O. Pearce
  1 sibling, 1 reply; 50+ messages in thread
From: Jeremy Maitin-Shepard @ 2008-05-10  0:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Brandon Casey, Geert Bosch, Jeff King, git

Junio C Hamano <gitster@pobox.com> writes:

> Nicolas Pitre <nico@cam.org> writes:
>> On Fri, 9 May 2008, Brandon Casey wrote:
>> 
>>> Unreferenced objects are sometimes used by other repositories which have
>>> this repository listed as an alternate. So it may not be a good idea to
>>> make the unreferenced objects inaccessible.
>> 
>> Nah.  If this is really the case then you shouldn't be running gc at all 
>> in the first place.

> True.

> I think the true motivation behind --keep-unreachable is not about the
> shared object store (aka "alternates") but about races between gc and
> push (or fetch).  Before push (or fetch) finishes and updates refs, the
> new objects they create would be dangling _and_ the objects these dangling
> objects refer to may be packed but unreferenced.  Repacking unreferenced
> packed objects was a way to avoid losing them.

I feel like the current approach of (not very well) keeping track of
which objects are still needed is very messy, not very well defined or
based on specific solid principles, and prone to errors and losing
objects.

Things like git clone -shared can only really be used in extremely
specialized setups, or if pruning of unreferenced objects is completely
disabled in the source repository, or if specialized scripts are used to
do the garbage collection that take into account the references of the
"child" repository.  It is my impression that even repo.or.cz, while it
has some safe guards, does not even completely safely handle garbage
collection.  Probably it would be very useful to examples of such
scripts in contrib.

I think that ultimately, some general purpose and reliable solution
needs to be found to handle the cases of (1) a repository having its
objects referenced by another via info/alternates; (2) a repository with
multiple working directories (presumably this should warn/error out
unless given a force option/detach head and warn if you try to switch
HEAD for some working directory to the same branch as some other working
directory).  It seems, btw, that a third type of clone, one which merely
symlinks the objects directory, would also be useful, once there is a
solution to the robustness issue.  This would be a case (3) that needs
to be handled as well.

It seems that clear that ultimately, to handle these three cases, every
repository needs to know about every other repository, probably via a
symlink to other repository's .git directory.  Git gc would then also
examine any refs in this directory, making sure to avoid circular
references that might result from following the symlinks.  It should
also probably error out if it finds a symlink that doesn't point to a
valid git repository, because such a symlink either refers to a
now-deleted repository for which the symlink needs to be cleaned up, or
it refers to a repository that was moved and therefore the symlink needs
to be updated.  Simply ignoring invalid symlinks could result in pruning
objects that need to be kept for repositories that have moved.

It is extremely cumbersome to have to worry about whether there are
other concurrent accesses to the repository when running e.g. git gc.
For servers, you may never be able to guarantee that nothing else is
accessing the repository concurrently.  Here is a possible solution:

Each git process creates a log file of the references that it has
created.  The log file should be named in some way with e.g. the process
id and start time of the process, and simply consist of a list of
20-byte sha1 hashes to be considered additional in-use references for
the purpose of garbage collection.  The log file would be cleaned up
when the process exits, and would also be deleted by any instance of git
gc that notices a stale log file that doesn't correspond to a running
process.  To handle shell scripts that need to deal with git-hash-object
directly, git hash-object could be passed maybe a file descriptor or
filename of a log file to use instead of creating one.  Maybe the log
file format could be more complicated, and also support paths to
e.g. alternate index files to also consider for references.  Things
would need to be one so that race conditions do not occur, but I think
something like this would work.

-- 
Jeremy Maitin-Shepard

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-10  0:07                                       ` git gc & deleted branches Jeremy Maitin-Shepard
@ 2008-05-10  0:20                                         ` Shawn O. Pearce
  2008-05-10  0:43                                           ` Jeremy Maitin-Shepard
  2008-05-10  1:21                                           ` Junio C Hamano
  0 siblings, 2 replies; 50+ messages in thread
From: Shawn O. Pearce @ 2008-05-10  0:20 UTC (permalink / raw)
  To: Jeremy Maitin-Shepard
  Cc: Junio C Hamano, Nicolas Pitre, Brandon Casey, Geert Bosch,
	Jeff King, git

Jeremy Maitin-Shepard <jbms@cmu.edu> wrote:
> It is extremely cumbersome to have to worry about whether there are
> other concurrent accesses to the repository when running e.g. git gc.
> For servers, you may never be able to guarantee that nothing else is
> accessing the repository concurrently.  Here is a possible solution:
> 
> Each git process creates a log file of the references that it has
> created.  The log file should be named in some way with e.g. the process
> id and start time of the process, and simply consist of a list of
> 20-byte sha1 hashes to be considered additional in-use references for
> the purpose of garbage collection.

I believe we partially considered that in the past and discarded it
as far too complex implementation-wise for the benefit it gives us.

The current approach of leaving unreachable loose objects around
for 2 weeks is good enough.  Any Git process that has been running
for 2 weeks while still not linking everything it needs into the
reachable refs of that repository is already braindamaged and
shouldn't be running anymore.

If we are dealing with a pack file, those are protected by .keep
"lock files" between the time they are created on disk and the
time that the git-fetch or git-receive-pack process has finished
updating the refs to anchor the pack's contents as reachable.
Every once in a while a stale .keep file gets left behind when a
process gets killed by the OS, and its damn annoying to clean up.

I'd hate to clean up logs from every little git-add or git-commit
that aborted in the middle uncleanly.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-10  0:20                                         ` Shawn O. Pearce
@ 2008-05-10  0:43                                           ` Jeremy Maitin-Shepard
  2008-05-10  1:21                                           ` Junio C Hamano
  1 sibling, 0 replies; 50+ messages in thread
From: Jeremy Maitin-Shepard @ 2008-05-10  0:43 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Junio C Hamano, Nicolas Pitre, Brandon Casey, Geert Bosch,
	Jeff King, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Jeremy Maitin-Shepard <jbms@cmu.edu> wrote:
>> It is extremely cumbersome to have to worry about whether there are
>> other concurrent accesses to the repository when running e.g. git gc.
>> For servers, you may never be able to guarantee that nothing else is
>> accessing the repository concurrently.  Here is a possible solution:
>> 
>> Each git process creates a log file of the references that it has
>> created.  The log file should be named in some way with e.g. the process
>> id and start time of the process, and simply consist of a list of
>> 20-byte sha1 hashes to be considered additional in-use references for
>> the purpose of garbage collection.

> I believe we partially considered that in the past and discarded it
> as far too complex implementation-wise for the benefit it gives us.

It doesn't seem all that complex, and I'd say that fundamentally it is
the _correct_ way to do things.  Being sloppy is always easier in the
short run, but then either means the system is permanently broken or
results in a lot of "fixing up" work later.  I think almost all of the
work of handling these log files could be done without impacting a lot
of code that calls the relevant APIs that would actually use the log
files.  I think the biggest impact would be on non-C code, but even for
that code, appropriate wrapper could be used to avoid having to make
many changes.

> The current approach of leaving unreachable loose objects around
> for 2 weeks is good enough.  Any Git process that has been running
> for 2 weeks while still not linking everything it needs into the
> reachable refs of that repository is already braindamaged and
> shouldn't be running anymore.

This sort of reasoning just leads to an inherently unreliable system.
Sure, two weeks might seem good enough for nearly all cases, but why
_shouldn't_ I be able to leave my editor open for two weeks before
typing in my commit message and finishing the commit, or wait for two
weeks in the middle of a rebase (it seems that in the new
implementation, temporary refs are created basically to do the same
thing as the log file I described.)  I could easily be typing up my
commit message, then switch to something else, and happen not to come
back to it for two weeks.

Because such a "timeout" based solution isn't really the "correct
solution" but will work most of the time, potential problems won't be
noticed while testing.

Another significant issue is that this timeout means that unreferenced
junk has to stay around in the repository for two weeks for no (good)
reason.

> If we are dealing with a pack file, those are protected by .keep
> "lock files" between the time they are created on disk and the
> time that the git-fetch or git-receive-pack process has finished
> updating the refs to anchor the pack's contents as reachable.
> Every once in a while a stale .keep file gets left behind when a
> process gets killed by the OS, and its damn annoying to clean up.

> I'd hate to clean up logs from every little git-add or git-commit
> that aborted in the middle uncleanly.

First of all, merely exiting due to an error should not cause log files
to be left around.  The only thing that should cause log files to be
left around is kill -9 or a system crash.  Second, by storing the
process id and a timestamp of when the log file was created, it is
possible to reliably determine if a log file is stale.

-- 
Jeremy Maitin-Shepard

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-10  0:20                                         ` Shawn O. Pearce
  2008-05-10  0:43                                           ` Jeremy Maitin-Shepard
@ 2008-05-10  1:21                                           ` Junio C Hamano
  2008-05-10  1:51                                             ` Jeremy Maitin-Shepard
  1 sibling, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2008-05-10  1:21 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Jeremy Maitin-Shepard, Nicolas Pitre, Brandon Casey, Geert Bosch,
	Jeff King, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Jeremy Maitin-Shepard <jbms@cmu.edu> wrote:
>> It is extremely cumbersome to have to worry about whether there are
>> other concurrent accesses to the repository when running e.g. git gc.
>> For servers, you may never be able to guarantee that nothing else is
>> accessing the repository concurrently.  Here is a possible solution:
>> 
>> Each git process creates a log file of the references that it has
>> created.  The log file should be named in some way with e.g. the process
>> id and start time of the process, and simply consist of a list of
>> 20-byte sha1 hashes to be considered additional in-use references for
>> the purpose of garbage collection.

How would that solve the issue that you should not prune/gc the repository
"clone --shared" aka "alternates" borrows from?

By the way, I do not think your "git-commit stopped for two weeks due to a
long editing session of the commit message" should result in any object
lossage, as the new objects are all reachable from the index, and the new
tree nor the new commit hasn't been built while you are typing (rather,
not typing) the log message.

Hmm, a partial commit that uses a temporary index file may lose, come to
think of it.  Perhaps we should teach reachable.c about the temporary
index file as well.  I dunno.
 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-10  1:21                                           ` Junio C Hamano
@ 2008-05-10  1:51                                             ` Jeremy Maitin-Shepard
  2008-05-10  5:25                                               ` Jeff King
  0 siblings, 1 reply; 50+ messages in thread
From: Jeremy Maitin-Shepard @ 2008-05-10  1:51 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Shawn O. Pearce, Nicolas Pitre, Brandon Casey, Geert Bosch,
	Jeff King, git

Junio C Hamano <gitster@pobox.com> writes:

> "Shawn O. Pearce" <spearce@spearce.org> writes:
>> Jeremy Maitin-Shepard <jbms@cmu.edu> wrote:
>>> It is extremely cumbersome to have to worry about whether there are
>>> other concurrent accesses to the repository when running e.g. git gc.
>>> For servers, you may never be able to guarantee that nothing else is
>>> accessing the repository concurrently.  Here is a possible solution:
>>> 
>>> Each git process creates a log file of the references that it has
>>> created.  The log file should be named in some way with e.g. the process
>>> id and start time of the process, and simply consist of a list of
>>> 20-byte sha1 hashes to be considered additional in-use references for
>>> the purpose of garbage collection.

> How would that solve the issue that you should not prune/gc the repository
> "clone --shared" aka "alternates" borrows from?

The log files are only for handling in-progress commands editing the
repository.  I also describe in first part of the e-mail a possible
solution to that issue as well as the issues created by having multiple
working directories:

When you create a new working directory, you would also create in the
original repository a symlink named
e.g. orig_repo/.git/peers/<some-arbitrary-name-that-doesn't-matter> that
points to the .git directory of the newly created working directory.
git clone -shared would likewise create such a link in the original
repository.  There could be a separate simple command to "destroy" a
repository created via clone -shared or via new-work-dir that would
simply remove this "peer" symlink from any repositories it shares from,
and then rm -rf the target repository.  The list of repositories that a
given target repository shares from would be discovered using perhaps
several different methods, depending on whether it is a new work dir, an
actual separate repository, or the new type of "shared" repository I
suggested in my original e-mail, namely one that has its own refs but
completely shares the object store of the original repository, e.g. via
a symlink to the original repository's objects directory In any case, I
believe the information to go "upstream" is already available, and we
just need to add those "peer" symlinks in order to be able to go
"downstream".

There could also be a simple git command to move a repository that would
take care of updating all of the references that other repositories have
to it.  Currently it is not possible to write such a command, because
the "downstream" links are not stored, but with these added symlinks it
would be possible.

As I said in my previous e-mail, if git gc finds any broken symlinks
(i.e. symlinks that point to invalid repositories), it would error out,
because user attention is required to specify whether the symlinks
correspond to deleted repositories, or to repositories that have been
moved without making the proper updates.

> By the way, I do not think your "git-commit stopped for two weeks due to a
> long editing session of the commit message" should result in any object
> lossage, as the new objects are all reachable from the index, and the new
> tree nor the new commit hasn't been built while you are typing (rather,
> not typing) the log message.

> Hmm, a partial commit that uses a temporary index file may lose, come to
> think of it.  Perhaps we should teach reachable.c about the temporary
> index file as well.  I dunno.

Well, providing a generic mechanism for telling git about reachable
things other than the index and refs is precisely what these log files
would do, and also because they would record the process id and a
timestamp, stale log files would automatically get cleaned up.  If each
individual git command has its own special way of trying to keep track
of temporary references, it is just going to be more complicated and
more error prone.

-- 
Jeremy Maitin-Shepard

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
       [not found]                                 ` <7v1w4bb291.fsf@gitster.siamese.dyndns.org>
@ 2008-05-10  3:32                                   ` Brandon Casey
  2008-05-10  4:15                                     ` Brandon Casey
  0 siblings, 1 reply; 50+ messages in thread
From: Brandon Casey @ 2008-05-10  3:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

I didn't read your email until now. It may sound strange, but I can't
read my personal email from work.

Also, looks like I inadvertantly took this message off-list.

On Fri, May 9, 2008 at 4:23 PM, Junio C Hamano <gitster@pobox.com> wrote:
> "Brandon Casey" <drafnel@gmail.com> writes:
>
>> Is this invocation of pack-objects that you commented on necessary
>> though? This is the section that is replacing an existing pack file
>> with a newly created pack file of the same name...
>
> Yuck, I failed to see it in the context.  You are right.  This won't do
> anything I suspect.

Now I'm confused. I hope you realized I said "pack-objects" when I
meant "unpack-objects".

It was only the first invocation of unpack-objects that I thought may
be unnecessary. This is because pack-objects (yes, pack) created a new
pack with
the same name as an existing pack which means (if I'm thinking about things
correctly) that none of the objects inside the pack are unreachable. So trying
to unpack-objects is a waste of time.

There is a second invocation of unpack-objects which is run on packs which
will be deleted. This one should unpack any unreferenced objects from each pack.

-brandon


The remainder of your email is below since the list has never seen it.


> Perhaps we would want a new option --eject that causes pack-object write
> out unreachable packed objects in the loose format?  This would require a
> minor surgery to write_sha1_file() so that it won't pay attention to the
> return value of has_sha1_file(sha1), perhaps like this.
>
> This is a lunch-time hack and needs more serious work, such as fixing
> horrible "write_sha1_file_1" to something more sensible.
>
> Also it should add --eject as an independent and incompatible option
> instead of dropping --keep-unpacked support.
>
> ---
>
>  builtin-pack-objects.c |   56 ++++++++++++-----------------------------------
>  sha1_file.c            |    9 ++++++-
>  2 files changed, 22 insertions(+), 43 deletions(-)
>
> diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
> index 777f272..8240a4b 100644
> --- a/builtin-pack-objects.c
> +++ b/builtin-pack-objects.c
> @@ -1836,38 +1836,25 @@ struct in_pack {
>        struct in_pack_object *array;
>  };
>
> -static void mark_in_pack_object(struct object *object, struct packed_git *p, struct in_pack *in_pack)
> +static void eject_object(struct object *obj)
>  {
> -       in_pack->array[in_pack->nr].offset = find_pack_entry_one(object->sha1, p);
> -       in_pack->array[in_pack->nr].object = object;
> -       in_pack->nr++;
> -}
> -
> -/*
> - * Compare the objects in the offset order, in order to emulate the
> - * "git-rev-list --objects" output that produced the pack originally.
> - */
> -static int ofscmp(const void *a_, const void *b_)
> -{
> -       struct in_pack_object *a = (struct in_pack_object *)a_;
> -       struct in_pack_object *b = (struct in_pack_object *)b_;
> +       enum object_type type;
> +       unsigned long size;
> +       void *data;
> +       unsigned char sha1[20];
>
> -       if (a->offset < b->offset)
> -               return -1;
> -       else if (a->offset > b->offset)
> -               return 1;
> -       else
> -               return hashcmp(a->object->sha1, b->object->sha1);
> +       data = read_sha1_file(obj->sha1, &type, &size);
> +       assert(data && type == obj->type);
> +       if (write_sha1_file_1(data, size, typename(type), sha1, 1))
> +               die("cannot eject packed object %s", sha1_to_hex(obj->sha1));
> +       assert(!memcmp(sha1, obj->sha1));
>  }
>
> -static void add_objects_in_unpacked_packs(struct rev_info *revs)
> +static void eject_objects_in_unpacked_packs(struct rev_info *revs)
>  {
>        struct packed_git *p;
> -       struct in_pack in_pack;
>        uint32_t i;
>
> -       memset(&in_pack, 0, sizeof(in_pack));
> -
>        for (p = packed_git; p; p = p->next) {
>                const unsigned char *sha1;
>                struct object *o;
> @@ -1881,28 +1868,15 @@ static void add_objects_in_unpacked_packs(struct rev_info *revs)
>                if (open_pack_index(p))
>                        die("cannot open pack index");
>
> -               ALLOC_GROW(in_pack.array,
> -                          in_pack.nr + p->num_objects,
> -                          in_pack.alloc);
> -
>                for (i = 0; i < p->num_objects; i++) {
>                        sha1 = nth_packed_object_sha1(p, i);
>                        o = lookup_unknown_object(sha1);
> -                       if (!(o->flags & OBJECT_ADDED))
> -                               mark_in_pack_object(o, p, &in_pack);
> +                       if (o->flags & OBJECT_ADDED)
> +                               continue;
>                        o->flags |= OBJECT_ADDED;
> +                       eject_object(o);
>                }
>        }
> -
> -       if (in_pack.nr) {
> -               qsort(in_pack.array, in_pack.nr, sizeof(in_pack.array[0]),
> -                     ofscmp);
> -               for (i = 0; i < in_pack.nr; i++) {
> -                       struct object *o = in_pack.array[i].object;
> -                       add_object_entry(o->sha1, o->type, "", 0);
> -               }
> -       }
> -       free(in_pack.array);
>  }
>
>  static void get_object_list(int ac, const char **av)
> @@ -1938,7 +1912,7 @@ static void get_object_list(int ac, const char **av)
>        traverse_commit_list(&revs, show_commit, show_object);
>
>        if (keep_unreachable)
> -               add_objects_in_unpacked_packs(&revs);
> +               eject_objects_in_unpacked_packs(&revs);
>  }
>
>  static int adjust_perm(const char *path, mode_t mode)
> diff --git a/sha1_file.c b/sha1_file.c
> index 3516777..ce3a661 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2102,7 +2102,7 @@ int hash_sha1_file(const void *buf, unsigned long len, const char *type,
>        return 0;
>  }
>
> -int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
> +int write_sha1_file_1(void *buf, unsigned long len, const char *type, unsigned char *returnsha1, int refresh)
>  {
>        int size, ret;
>        unsigned char *compressed;
> @@ -2120,7 +2120,7 @@ int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned cha
>        filename = sha1_file_name(sha1);
>        if (returnsha1)
>                hashcpy(returnsha1, sha1);
> -       if (has_sha1_file(sha1))
> +       if (!refresh && has_sha1_file(sha1))
>                return 0;
>        fd = open(filename, O_RDONLY);
>        if (fd >= 0) {
> @@ -2185,6 +2185,11 @@ int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned cha
>        return move_temp_to_file(tmpfile, filename);
>  }
>
> +int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
> +{
> +       return write_sha1_file_1(buf, len, type, returnsha1, 0);
> +}
> +
>  /*
>  * We need to unpack and recompress the object for writing
>  * it out to a different file.
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 0/3] leave unreferenced objects unpacked
  2008-05-09  3:21                             ` Junio C Hamano
       [not found]                               ` <ee63ef30805082105w7f04a2d1y65a4618aeb787cac@mail.gmail.com>
@ 2008-05-10  4:01                               ` drafnel
  2008-05-10  4:01                               ` [PATCH 1/3] repack: modify behavior of -A option to " drafnel
                                                 ` (2 subsequent siblings)
  4 siblings, 0 replies; 50+ messages in thread
From: drafnel @ 2008-05-10  4:01 UTC (permalink / raw)
  To: gitster; +Cc: git

Here is a formal patch. I removed the first invocation of
unpack-objects since I think it is unneccessary.

Followed up with mods to git-gc.

-brandon

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/3] repack: modify behavior of -A option to leave unreferenced objects unpacked
  2008-05-09  3:21                             ` Junio C Hamano
       [not found]                               ` <ee63ef30805082105w7f04a2d1y65a4618aeb787cac@mail.gmail.com>
  2008-05-10  4:01                               ` [PATCH 0/3] leave unreferenced objects unpacked drafnel
@ 2008-05-10  4:01                               ` drafnel
  2008-05-10  6:03                                 ` Jeff King
  2008-05-10  4:01                               ` [PATCH 2/3] git-gc: always use -A when manually repacking drafnel
  2008-05-10  4:01                               ` [PATCH 3/3] builtin-gc.c: deprecate --prune, it now really has no effect drafnel
  4 siblings, 1 reply; 50+ messages in thread
From: drafnel @ 2008-05-10  4:01 UTC (permalink / raw)
  To: gitster; +Cc: git, Brandon Casey

From: Brandon Casey <drafnel@gmail.com>

The previous behavior of the -A option was to retain any previously
packed objects which had become unreferenced, and place them into the newly
created pack file.  Since git-gc, when run automatically with the --auto
option, calls repack with the -A option, this had the effect of retaining
unreferenced packed objects indefinitely. To avoid this scenario, the
user was required to run git-gc with the little known --prune option or
to manually run repack with the -a option.

This patch changes the behavior of the -A option so that unreferenced
objects that exist in any pack file being replaced, will be unpacked into
the repository. The unreferenced loose objects can then be garbage collected
by git-gc (i.e. git-prune) based on the gc.pruneExpire setting.

Also add new tests for checking whether unreferenced objects which were
previously packed are properly left in the repository unpacked after
repacking.

Signed-off-by: Brandon Casey <drafnel@gmail.com>
---
 git-repack.sh                        |   18 +++++++++---
 t/t7701-repack-unpack-unreachable.sh |   47 ++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 5 deletions(-)
 create mode 100755 t/t7701-repack-unpack-unreachable.sh

diff --git a/git-repack.sh b/git-repack.sh
index e18eb3f..a0e06ed 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -30,7 +30,7 @@ do
 	-n)	no_update_info=t ;;
 	-a)	all_into_one=t ;;
 	-A)	all_into_one=t
-		keep_unreachable=--keep-unreachable ;;
+		keep_unreachable=t ;;
 	-d)	remove_redundant=t ;;
 	-q)	quiet=-q ;;
 	-f)	no_reuse=--no-reuse-object ;;
@@ -78,9 +78,6 @@ case ",$all_into_one," in
 	if test -z "$args"
 	then
 		args='--unpacked --incremental'
-	elif test -n "$keep_unreachable"
-	then
-		args="$args $keep_unreachable"
 	fi
 	;;
 esac
@@ -130,7 +127,18 @@ then
 		  do
 			case " $fullbases " in
 			*" $e "*) ;;
-			*)	rm -f "$e.pack" "$e.idx" "$e.keep" ;;
+			*)
+				rm -f "$e.idx" "$e.keep"
+				if test -n "$keep_unreachable" &&
+				   test -f "$e.pack"
+				then
+					git unpack-objects < "$e.pack" || {
+						echo >&2 "Failed unpacking unreachable objects from redundant pack file $e.pack"
+						exit 1
+					}
+				fi
+				rm -f "$e.pack"
+			;;
 			esac
 		  done
 		)
diff --git a/t/t7701-repack-unpack-unreachable.sh b/t/t7701-repack-unpack-unreachable.sh
new file mode 100755
index 0000000..6a5211f
--- /dev/null
+++ b/t/t7701-repack-unpack-unreachable.sh
@@ -0,0 +1,47 @@
+#!/bin/sh
+
+test_description='git-repack works correctly'
+
+. ./test-lib.sh
+
+test_expect_success '-A option leaves unreachable objects unpacked' '
+	echo content > file1 &&
+	git add . &&
+	git commit -m initial_commit &&
+	# create a transient branch with unique content
+	git checkout -b transient_branch &&
+	echo more content >> file1 &&
+	# record the objects created in the database for file, commit, tree
+	fsha1=$(git hash-object file1) &&
+	git commit -a -m more_content &&
+	csha1=$(git rev-parse HEAD^{commit}) &&
+	tsha1=$(git rev-parse HEAD^{tree}) &&
+	git checkout master &&
+	echo even more content >> file1 &&
+	git commit -a -m even_more_content &&
+	# delete the transient branch
+	git branch -D transient_branch &&
+	# pack the repo
+	git repack -A -d -l &&
+	# verify objects are packed in repository
+	test 3 = $(git verify-pack -v -- .git/objects/pack/*.idx |
+		   grep -e "^$fsha1 " -e "^$csha1 " -e "^$tsha1 " |
+		   sort | uniq | wc -l) &&
+	git show $fsha1 &&
+	git show $csha1 &&
+	git show $tsha1 &&
+	# now expire the reflog
+	sleep 1 &&
+	git reflog expire --expire-unreachable=now --all &&
+	# and repack
+	git repack -A -d -l &&
+	# verify objects are retained unpacked
+	test 0 = $(git verify-pack -v -- .git/objects/pack/*.idx |
+		   grep -e "^$fsha1 " -e "^$csha1 " -e "^$tsha1 " |
+		   sort | uniq | wc -l) &&
+	git show $fsha1 &&
+	git show $csha1 &&
+	git show $tsha1
+'
+
+test_done
-- 
1.5.5.67.g9a49

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/3] git-gc: always use -A when manually repacking
  2008-05-09  3:21                             ` Junio C Hamano
                                                 ` (2 preceding siblings ...)
  2008-05-10  4:01                               ` [PATCH 1/3] repack: modify behavior of -A option to " drafnel
@ 2008-05-10  4:01                               ` drafnel
  2008-05-10  4:01                               ` [PATCH 3/3] builtin-gc.c: deprecate --prune, it now really has no effect drafnel
  4 siblings, 0 replies; 50+ messages in thread
From: drafnel @ 2008-05-10  4:01 UTC (permalink / raw)
  To: gitster; +Cc: git, Brandon Casey

From: Brandon Casey <drafnel@gmail.com>

Now that repack -A will leave unreferenced objects unpacked, there is
no reason to use the -a option to repack (which will discard unreferenced
objects). The unpacked unreferenced objects will not be repacked by a
subsequent repack, and will eventually be pruned by git-gc based on the
gc.pruneExpire config option.
---
 builtin-gc.c |   13 ++-----------
 1 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/builtin-gc.c b/builtin-gc.c
index f99ebc7..6db2f51 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -256,17 +256,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 			"performance. You may also\n"
 			"run \"git gc\" manually. See "
 			"\"git help gc\" for more information.\n");
-	} else {
-		/*
-		 * Use safer (for shared repos) "-A" option to
-		 * repack when not pruning. Auto-gc makes its
-		 * own decision.
-		 */
-		if (prune)
-			append_option(argv_repack, "-a", MAX_ADD);
-		else
-			append_option(argv_repack, "-A", MAX_ADD);
-	}
+	} else
+		append_option(argv_repack, "-A", MAX_ADD);
 
 	if (pack_refs && run_command_v_opt(argv_pack_refs, RUN_GIT_CMD))
 		return error(FAILED_RUN, argv_pack_refs[0]);
-- 
1.5.5.67.g9a49

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/3] builtin-gc.c: deprecate --prune, it now really has no effect
  2008-05-09  3:21                             ` Junio C Hamano
                                                 ` (3 preceding siblings ...)
  2008-05-10  4:01                               ` [PATCH 2/3] git-gc: always use -A when manually repacking drafnel
@ 2008-05-10  4:01                               ` drafnel
  4 siblings, 0 replies; 50+ messages in thread
From: drafnel @ 2008-05-10  4:01 UTC (permalink / raw)
  To: gitster; +Cc: git, Brandon Casey

From: Brandon Casey <drafnel@gmail.com>

---
 builtin-gc.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/builtin-gc.c b/builtin-gc.c
index 6db2f51..48f7d95 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -219,7 +219,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	char buf[80];
 
 	struct option builtin_gc_options[] = {
-		OPT_BOOLEAN(0, "prune", &prune, "prune unreferenced objects"),
+		OPT_BOOLEAN(0, "prune", &prune, "prune unreferenced objects (deprecated)"),
 		OPT_BOOLEAN(0, "aggressive", &aggressive, "be more thorough (increased runtime)"),
 		OPT_BOOLEAN(0, "auto", &auto_gc, "enable auto-gc mode"),
 		OPT_BOOLEAN('q', "quiet", &quiet, "suppress progress reports"),
@@ -249,7 +249,6 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		/*
 		 * Auto-gc should be least intrusive as possible.
 		 */
-		prune = 0;
 		if (!need_to_gc())
 			return 0;
 		fprintf(stderr, "Auto packing your repository for optimum "
-- 
1.5.5.67.g9a49

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-10  3:32                                   ` Brandon Casey
@ 2008-05-10  4:15                                     ` Brandon Casey
  0 siblings, 0 replies; 50+ messages in thread
From: Brandon Casey @ 2008-05-10  4:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, May 9, 2008 at 10:32 PM, Brandon Casey <drafnel@gmail.com> wrote:
> On Fri, May 9, 2008 at 4:23 PM, Junio C Hamano <gitster@pobox.com> wrote:
> The remainder of your email is below since the list has never seen it.
>
>
>> Perhaps we would want a new option --eject that causes pack-object write
>> out unreachable packed objects in the loose format?

Two comments on this.

1) If you decide to go in this direction, I think the option should be
    named --unpack-unreachable rather than --eject.
2) There is some part of me that is against a program named
    pack-objects doing unpacking. But it is probably more efficient
    this way.

-brandon

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-10  1:51                                             ` Jeremy Maitin-Shepard
@ 2008-05-10  5:25                                               ` Jeff King
  2008-05-10  5:36                                                 ` Jeremy Maitin-Shepard
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2008-05-10  5:25 UTC (permalink / raw)
  To: Jeremy Maitin-Shepard
  Cc: Junio C Hamano, Shawn O. Pearce, Nicolas Pitre, Brandon Casey,
	Geert Bosch, git

On Fri, May 09, 2008 at 09:51:15PM -0400, Jeremy Maitin-Shepard wrote:

> When you create a new working directory, you would also create in the
> original repository a symlink named
> e.g. orig_repo/.git/peers/<some-arbitrary-name-that-doesn't-matter> that
> points to the .git directory of the newly created working directory.

That assumes you _can_ write to the original repository. That may or may
not be the case, depending on your setup.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-10  5:25                                               ` Jeff King
@ 2008-05-10  5:36                                                 ` Jeremy Maitin-Shepard
  2008-05-10  9:04                                                   ` Johannes Schindelin
  0 siblings, 1 reply; 50+ messages in thread
From: Jeremy Maitin-Shepard @ 2008-05-10  5:36 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Shawn O. Pearce, Nicolas Pitre, Brandon Casey,
	Geert Bosch, git

Jeff King <peff@peff.net> writes:

> On Fri, May 09, 2008 at 09:51:15PM -0400, Jeremy Maitin-Shepard wrote:
>> When you create a new working directory, you would also create in the
>> original repository a symlink named
>> e.g. orig_repo/.git/peers/<some-arbitrary-name-that-doesn't-matter> that
>> points to the .git directory of the newly created working directory.

> That assumes you _can_ write to the original repository. That may or may
> not be the case, depending on your setup.

Well, I suppose in that case it could print a warning or maybe fail
without some "force" option.  If you can't write to the repository, then
I think it is safe to say that it will never know or care about you, so
you will fundamentally have a fragile setup.  I'd say that except in
very special circumstances, you are better off just not sharing it at
all.

Consider, for instance, that even if the repository that you are sharing
form never deletes branches and never does non-fast-forward updates of
references, it could very well happen to have, due to some temporary
operation, some unreferenced object that happens to be exactly the same
object that you want to add to your repository.  Because you've listed
it in info/alternates, you won't write that object to your own
repository, but then the source repository will very likely garbage
collect the object at some later point, corrupting your repository.

-- 
Jeremy Maitin-Shepard

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/3] repack: modify behavior of -A option to leave unreferenced objects unpacked
  2008-05-10  4:01                               ` [PATCH 1/3] repack: modify behavior of -A option to " drafnel
@ 2008-05-10  6:03                                 ` Jeff King
  2008-05-11  1:10                                   ` Nicolas Pitre
  2008-05-11  4:16                                   ` Brandon Casey
  0 siblings, 2 replies; 50+ messages in thread
From: Jeff King @ 2008-05-10  6:03 UTC (permalink / raw)
  To: drafnel; +Cc: gitster, git

On Fri, May 09, 2008 at 11:01:55PM -0500, drafnel@gmail.com wrote:

> -		keep_unreachable=--keep-unreachable ;;
> +		keep_unreachable=t ;;

Can we call this something else (like unpack_unreachable) since it now
has nothing to do with the --keep-unreachable flag?

Also, should --keep-unreachable be deprecated / removed?

> +			*)
> +				rm -f "$e.idx" "$e.keep"
> +				if test -n "$keep_unreachable" &&
> +				   test -f "$e.pack"
> +				then
> +					git unpack-objects < "$e.pack" || {
> +						echo >&2 "Failed unpacking unreachable objects from redundant pack file $e.pack"
> +						exit 1
> +					}
> +				fi

I still like Geert's suggestion of unpacking them to a _different_
place. That helps to avoid spurious "gc --auto" invocations caused by
too many prunable objects. Though it certainly doesn't solve it, and
maybe that just needs to be fixed separately.

Possibly the "gc --auto" test should be:

  - count objects; if too few, exit
  - count unreachable loose objects; if too few, exit
  - run gc

That means having a lot of unreachable objects will still incur some
extra processing, but not as much as a full repack. And it won't bug the
user with a "you need to repack" message.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-10  5:36                                                 ` Jeremy Maitin-Shepard
@ 2008-05-10  9:04                                                   ` Johannes Schindelin
  2008-05-10 16:24                                                     ` Jeremy Maitin-Shepard
  0 siblings, 1 reply; 50+ messages in thread
From: Johannes Schindelin @ 2008-05-10  9:04 UTC (permalink / raw)
  To: Jeremy Maitin-Shepard
  Cc: Jeff King, Junio C Hamano, Shawn O. Pearce, Nicolas Pitre,
	Brandon Casey, Geert Bosch, git

Hi,

On Sat, 10 May 2008, Jeremy Maitin-Shepard wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Fri, May 09, 2008 at 09:51:15PM -0400, Jeremy Maitin-Shepard wrote:

> >> When you create a new working directory, you would also create in the
> >> original repository a symlink named
> >> e.g. orig_repo/.git/peers/<some-arbitrary-name-that-doesn't-matter> that
> >> points to the .git directory of the newly created working directory.
> 
> > That assumes you _can_ write to the original repository. That may or may
> > not be the case, depending on your setup.

FWIW this argument can be found in the mailing list.  It does not have to 
be told over and over again, right?

> Well, I suppose in that case it could print a warning or maybe fail 
> without some "force" option.  If you can't write to the repository, then 
> I think it is safe to say that it will never know or care about you, so 
> you will fundamentally have a fragile setup.  I'd say that except in 
> very special circumstances, you are better off just not sharing it at 
> all.

Counterexample kernel.org.  Counterexample repo.or.cz.

Hth,
Dscho

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-10  9:04                                                   ` Johannes Schindelin
@ 2008-05-10 16:24                                                     ` Jeremy Maitin-Shepard
  2008-05-11 11:11                                                       ` Johannes Schindelin
  0 siblings, 1 reply; 50+ messages in thread
From: Jeremy Maitin-Shepard @ 2008-05-10 16:24 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeff King, Junio C Hamano, Shawn O. Pearce, Nicolas Pitre,
	Brandon Casey, Geert Bosch, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi,
> On Sat, 10 May 2008, Jeremy Maitin-Shepard wrote:

>> Jeff King <peff@peff.net> writes:
>> 
>> > On Fri, May 09, 2008 at 09:51:15PM -0400, Jeremy Maitin-Shepard wrote:

>> >> When you create a new working directory, you would also create in the
>> >> original repository a symlink named
>> >> e.g. orig_repo/.git/peers/<some-arbitrary-name-that-doesn't-matter> that
>> >> points to the .git directory of the newly created working directory.
>> 
>> > That assumes you _can_ write to the original repository. That may or may
>> > not be the case, depending on your setup.

> FWIW this argument can be found in the mailing list.  It does not have to 
> be told over and over again, right?

Maybe you can point me at the relevant thread.  Fundamentally, though,
I'd say objects/info/alternates _cannot_ work reliably without the
source repository knowing about the objects that the sharing
repositories need.  Otherwise, there is no way for it to know not to
prune them.  The only way for it to have that information in general is
to write it in the repository.  In a site-specific setting, it may
indeed be possible to rely on some site-specific database, but that is
not particularly relevant.

Currently repository sharing seems to be used in many cases in quite
unsafe ways.  It may seem unfortunate that doing things the "safe way"
is much more of a hassle and doesn't work in certain environments, but
I'd say that is just the way things have to be.

Perhaps you can point me to an existing thread that addresses this idea,
though.

>> Well, I suppose in that case it could print a warning or maybe fail 
>> without some "force" option.  If you can't write to the repository, then 
>> I think it is safe to say that it will never know or care about you, so 
>> you will fundamentally have a fragile setup.  I'd say that except in 
>> very special circumstances, you are better off just not sharing it at 
>> all.

> Counterexample kernel.org.  Counterexample repo.or.cz.

repo.or.cz is not a counterexample.  It is completely "managed", and
could quite easily implement the approach I described.  I don't know
exactly how kernel.org works, but I imagine likewise some setuid helper
script could be provided to write these symlinks.

There is the issue that these setuid helper scripts would mean at the
very least that if user A can "fork" user B's repository, then to some
extent user B can make user A use large amounts of disk space
(i.e. exceed his quota or something) by just referencing a bunch of
temporary objects that user A happens to have in his repository, and it
would take careful examination of the git repository to actually figure
out that it is user B's fault.  I don't think this would be a
significant problem in practice, though.

-- 
Jeremy Maitin-Shepard

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/3] repack: modify behavior of -A option to leave unreferenced objects unpacked
  2008-05-10  6:03                                 ` Jeff King
@ 2008-05-11  1:10                                   ` Nicolas Pitre
  2008-05-11  1:23                                     ` Junio C Hamano
  2008-05-11  4:16                                   ` Brandon Casey
  1 sibling, 1 reply; 50+ messages in thread
From: Nicolas Pitre @ 2008-05-11  1:10 UTC (permalink / raw)
  To: Jeff King; +Cc: drafnel, gitster, git

On Sat, 10 May 2008, Jeff King wrote:

> Also, should --keep-unreachable be deprecated / removed?

Depends.  If it has no maintenance cost then we might as well keep it 
around.

> I still like Geert's suggestion of unpacking them to a _different_
> place. That helps to avoid spurious "gc --auto" invocations caused by
> too many prunable objects. Though it certainly doesn't solve it, and
> maybe that just needs to be fixed separately.

Having a separate location for objects seems clunky to me.

And the fundamental problem isn't solved indeed -- you may end up with 
many non expired unreachable loose objects already without packing them.

> Possibly the "gc --auto" test should be:
> 
>   - count objects; if too few, exit
>   - count unreachable loose objects; if too few, exit

Determining the number of unreachable objects is quite costly, packed or 
not.  So that isn't a good thing to do on every 'git gc --auto' 
invokation.

>   - run gc
> 
> That means having a lot of unreachable objects will still incur some
> extra processing, but not as much as a full repack. And it won't bug the
> user with a "you need to repack" message.

The auto gc performs incremental packing most of the time.  And that is 
way faster than figuring out which objects are unreachable.

For example, running 'git prune' in my Linux repo takes 16 seconds, even 
when there is nothing to prune.  Running 'git repack' (with no option so 
to perform an incremental repack) took less than 2 seconds to pack 541 
reachable objects that happened to be loose.

I'm now starting to wonder if there is a reason for keeping unreachable 
objects that used to be packed.  Putting --keep-unreachable aside for 
now, the only way an unreachable object could have entered a pack is if 
it used to be reachable before through the commit history or reflog.  
So if they're not reachable anymore, that's most probably because their 
reflog expired.  So what's the point for keeping them even longer?  
What's the reasoning that led to the creation of --keep-unreachable in 
the first place?


Nicolas

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/3] repack: modify behavior of -A option to leave unreferenced objects unpacked
  2008-05-11  1:10                                   ` Nicolas Pitre
@ 2008-05-11  1:23                                     ` Junio C Hamano
  0 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2008-05-11  1:23 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jeff King, drafnel, git

Nicolas Pitre <nico@cam.org> writes:

> I'm now starting to wonder if there is a reason for keeping unreachable 
> objects that used to be packed.  Putting --keep-unreachable aside for 
> now, the only way an unreachable object could have entered a pack is if 
> it used to be reachable before through the commit history or reflog.  
> So if they're not reachable anymore, that's most probably because their 
> reflog expired.  So what's the point for keeping them even longer?  
> What's the reasoning that led to the creation of --keep-unreachable in 
> the first place?

I think the logic went like this.

(1) You may have rewound your head since you last repacked; blobs and
    trees in the rewound commit are already packed now.

(2) Now you may be fetching (or somebody else may be pushing) a commit
    that contains such blobs and/or trees, and the fetch or push is small
    enough that it unpacks, but the packed and unreachable ones are not
    unpacked.

(3) But before that fetch or push finishes to update the ref, you can race
    with a "repack -a -d".

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/3] repack: modify behavior of -A option to leave unreferenced objects unpacked
  2008-05-10  6:03                                 ` Jeff King
  2008-05-11  1:10                                   ` Nicolas Pitre
@ 2008-05-11  4:16                                   ` Brandon Casey
  2008-05-11  4:51                                     ` Brandon Casey
  1 sibling, 1 reply; 50+ messages in thread
From: Brandon Casey @ 2008-05-11  4:16 UTC (permalink / raw)
  To: Jeff King; +Cc: gitster, git

On Sat, May 10, 2008 at 1:03 AM, Jeff King <peff@peff.net> wrote:
> On Fri, May 09, 2008 at 11:01:55PM -0500, drafnel@gmail.com wrote:
>
>> -             keep_unreachable=--keep-unreachable ;;
>> +             keep_unreachable=t ;;
>
> Can we call this something else (like unpack_unreachable) since it now
> has nothing to do with the --keep-unreachable flag?

Actually I initially changed it to unpack_unreachable, and then
changed it back. The reason I did this is because I think
keep_unreachable still describes what is being accomplished, that
unreachables are being kept. When -A is supplied along with -d,
unreachables are kept by being unpacked. When -d is not supplied,
unreachables are kept in their original pack file. If Geert's proposal
or something else is implemented, keep_unreachable may still be
appropriate. hmm?

> Also, should --keep-unreachable be deprecated / removed?
>
>> +                     *)
>> +                             rm -f "$e.idx" "$e.keep"
>> +                             if test -n "$keep_unreachable" &&
>> +                                test -f "$e.pack"
>> +                             then
>> +                                     git unpack-objects < "$e.pack" || {
>> +                                             echo >&2 "Failed unpacking unreachable objects from redundant pack file $e.pack"
>> +                                             exit 1
>> +                                     }
>> +                             fi
>
> I still like Geert's suggestion of unpacking them to a _different_
> place. That helps to avoid spurious "gc --auto" invocations caused by
> too many prunable objects. Though it certainly doesn't solve it, and
> maybe that just needs to be fixed separately.

That was my thinking.

>
> Possibly the "gc --auto" test should be:
>
>  - count objects; if too few, exit
>  - count unreachable loose objects; if too few, exit
>  - run gc
>
> That means having a lot of unreachable objects will still incur some
> extra processing, but not as much as a full repack. And it won't bug the
> user with a "you need to repack" message.

I've got a thought. How about limiting how often auto repack repacks
by looking at the timestamp of the most recent pack? Wouldn't the
packs already be prepared in most cases i.e. prepare_packed_git()

-brandon

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/3] repack: modify behavior of -A option to leave unreferenced objects unpacked
  2008-05-11  4:16                                   ` Brandon Casey
@ 2008-05-11  4:51                                     ` Brandon Casey
  0 siblings, 0 replies; 50+ messages in thread
From: Brandon Casey @ 2008-05-11  4:51 UTC (permalink / raw)
  To: Jeff King; +Cc: gitster, git

On Sat, May 10, 2008 at 11:16 PM, Brandon Casey <drafnel@gmail.com> wrote:

> I've got a thought. How about limiting how often auto repack repacks
> by looking at the timestamp of the most recent pack? Wouldn't the
> packs already be prepared in most cases i.e. prepare_packed_git()

completely untested and hopefully not mangled by google...

actually, this will do nothing for the case where there exists many
loose unreachable objects and no loose reachable objects since we
won't create a new pack with an updated timestamp to compare against.
So git-gc will continue to spin its wheels without getting anywhere.
Could we update the pack timestamp after running git-gc or use a
timestamp from someplace else?

-brandon

diff --git a/builtin-gc.c b/builtin-gc.c
index 48f7d95..16b1455 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -27,6 +27,7 @@ static int aggressive_window = -1;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
 static char *prune_expire = "2.weeks.ago";
+static time_t gc_auto_pack_frequency = 21600;  /* 6 hours */

 #define MAX_ADD 10
 static const char *argv_pack_refs[] = {"pack-refs", "--all", "--prune", NULL};
@@ -56,6 +57,10 @@ static int gc_config(const char *var, const char *value)
                gc_auto_pack_limit = git_config_int(var, value);
                return 0;
        }
+       if (!strcmp(var, "gc.autopackfrequency")) {
+               gc_auto_pack_frequency = git_config_ulong(var, value);
+               return 0;
+       }
        if (!strcmp(var, "gc.pruneexpire")) {
                if (!value)
                        return config_error_nonbool(var);
@@ -205,6 +210,14 @@ static int need_to_gc(void)
        else if (!too_many_loose_objects())
                return 0;

+       if (gc_auto_pack_frequency) {
+               prepare_packed_git();
+               if (packed_git &&
+                   packed_git->mtime >
+                   approxidate("now") - gc_auto_pack_frequency)
+                       return 0;
+       }
+
        if (run_hook())
                return 0;
        return 1;

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-10 16:24                                                     ` Jeremy Maitin-Shepard
@ 2008-05-11 11:11                                                       ` Johannes Schindelin
  2008-05-11 18:39                                                         ` Junio C Hamano
  0 siblings, 1 reply; 50+ messages in thread
From: Johannes Schindelin @ 2008-05-11 11:11 UTC (permalink / raw)
  To: Jeremy Maitin-Shepard
  Cc: Jeff King, Junio C Hamano, Shawn O. Pearce, Nicolas Pitre,
	Brandon Casey, Geert Bosch, git

Hi,

On Sat, 10 May 2008, Jeremy Maitin-Shepard wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > On Sat, 10 May 2008, Jeremy Maitin-Shepard wrote:
> 
> >> Jeff King <peff@peff.net> writes:
> >> 
> >> > On Fri, May 09, 2008 at 09:51:15PM -0400, Jeremy Maitin-Shepard 
> >> > wrote:
> 
> >> >> When you create a new working directory, you would also create in 
> >> >> the original repository a symlink named e.g. 
> >> >> orig_repo/.git/peers/<some-arbitrary-name-that-doesn't-matter> 
> >> >> that points to the .git directory of the newly created working 
> >> >> directory.
> >> 
> >> > That assumes you _can_ write to the original repository. That may 
> >> > or may not be the case, depending on your setup.
> 
> > FWIW this argument can be found in the mailing list.  It does not have 
> > to be told over and over again, right?
> 
> Maybe you can point me at the relevant thread.  Fundamentally, though,
> I'd say objects/info/alternates _cannot_ work reliably without the
> source repository knowing about the objects that the sharing
> repositories need.  Otherwise, there is no way for it to know not to
> prune them.  The only way for it to have that information in general is
> to write it in the repository.  In a site-specific setting, it may
> indeed be possible to rely on some site-specific database, but that is
> not particularly relevant.
> 
> Currently repository sharing seems to be used in many cases in quite
> unsafe ways.  It may seem unfortunate that doing things the "safe way"
> is much more of a hassle and doesn't work in certain environments, but
> I'd say that is just the way things have to be.
> 
> Perhaps you can point me to an existing thread that addresses this idea, 
> though.

Unfortunately, a quick search did not turn up anything useful.  Maybe you 
try your luck yourself...

> >> Well, I suppose in that case it could print a warning or maybe fail 
> >> without some "force" option.  If you can't write to the repository, 
> >> then I think it is safe to say that it will never know or care about 
> >> you, so you will fundamentally have a fragile setup.  I'd say that 
> >> except in very special circumstances, you are better off just not 
> >> sharing it at all.
> 
> > Counterexample kernel.org.  Counterexample repo.or.cz.
> 
> repo.or.cz is not a counterexample.  It is completely "managed", and 
> could quite easily implement the approach I described.

Half true... you said "if you can't write to the repository..." and on 
repo.or.cz, the first part is true, the second part not.

> There is the issue that these setuid helper scripts would mean at the 
> very least that if user A can "fork" user B's repository, then to some 
> extent user B can make user A use large amounts of disk space (i.e. 
> exceed his quota or something) by just referencing a bunch of temporary 
> objects that user A happens to have in his repository, and it would take 
> careful examination of the git repository to actually figure out that it 
> is user B's fault.  I don't think this would be a significant problem in 
> practice, though.

Well, I think that the setuid helper script would open a whole bunch of 
other issues.

I think that the shared repository problem is rather a semantic one, i.e. 
it is only solvable between the owners of the repository by good-ole 
talking, not something that can be solved by the tool (Git).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: git gc & deleted branches
  2008-05-11 11:11                                                       ` Johannes Schindelin
@ 2008-05-11 18:39                                                         ` Junio C Hamano
  0 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2008-05-11 18:39 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeremy Maitin-Shepard, Jeff King, Shawn O. Pearce, Nicolas Pitre,
	Brandon Casey, Geert Bosch, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Well, I think that the setuid helper script would open a whole bunch of 
> other issues.
>
> I think that the shared repository problem is rather a semantic one, i.e. 
> it is only solvable between the owners of the repository by good-ole 
> talking, not something that can be solved by the tool (Git).

I very strongly agree with you that the suid helper would be the last
ditch thing we would want to avoid doing unless there is no other way.  I
also agree with you that the owners of the repository need to be talking.

But I think the tool _could_ help them do their talking.  It is
conceivable that just like you can explicitly allow selected others to
push into your own repository, you would want to explicitly allow some
others to ask you to keep objects their repositories borrow from you.

	Originally, I wrote "allow others to borrow from you", but that is
	very ill defined.  If they can read from your repository they can
	unilaterally borrow from you without having any write permission
	to your repository that is needed to install backpointers.

I am not fundamentally opposed to a backpointer that point at borrowers so
that the lender can protect the objects that are pointed by them.
However, there are two technical issues in the solution of pointing at the
borrower's .git/refs with a symlink from the repository that is borrowed
from, as I pointed out when this came up last time.

 - There are systems without symbolic links.

 - A symref (e.g. refs/remotes/origin/HEAD of borrower that points at
   refs/remotes/origin/master of borrower) is relative to borrower's
   repository.

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2008-05-11 18:40 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-08 17:45 git gc & deleted branches Guido Ostkamp
2008-05-08 18:39 ` Jeff King
2008-05-08 18:55   ` Guido Ostkamp
2008-05-08 20:07     ` Brandon Casey
2008-05-08 20:52       ` Guido Ostkamp
2008-05-08 21:01         ` Jeff King
2008-05-08 21:15           ` Nicolas Pitre
2008-05-08 21:17             ` Jeff King
2008-05-08 21:23               ` Brandon Casey
2008-05-08 21:31                 ` Jeff King
2008-05-08 21:40                   ` Brandon Casey
2008-05-08 21:44                     ` Jeff King
2008-05-08 21:53                       ` Brandon Casey
2008-05-08 22:48                         ` Jeff King
2008-05-09  1:41                           ` Brandon Casey
2008-05-09  3:21                             ` Junio C Hamano
     [not found]                               ` <ee63ef30805082105w7f04a2d1y65a4618aeb787cac@mail.gmail.com>
     [not found]                                 ` <7v1w4bb291.fsf@gitster.siamese.dyndns.org>
2008-05-10  3:32                                   ` Brandon Casey
2008-05-10  4:15                                     ` Brandon Casey
2008-05-10  4:01                               ` [PATCH 0/3] leave unreferenced objects unpacked drafnel
2008-05-10  4:01                               ` [PATCH 1/3] repack: modify behavior of -A option to " drafnel
2008-05-10  6:03                                 ` Jeff King
2008-05-11  1:10                                   ` Nicolas Pitre
2008-05-11  1:23                                     ` Junio C Hamano
2008-05-11  4:16                                   ` Brandon Casey
2008-05-11  4:51                                     ` Brandon Casey
2008-05-10  4:01                               ` [PATCH 2/3] git-gc: always use -A when manually repacking drafnel
2008-05-10  4:01                               ` [PATCH 3/3] builtin-gc.c: deprecate --prune, it now really has no effect drafnel
2008-05-09  4:19                             ` git gc & deleted branches Jeff King
2008-05-09 15:00                               ` Geert Bosch
2008-05-09 15:14                                 ` Brandon Casey
2008-05-09 15:53                                   ` Jeff King
2008-05-09 15:56                                     ` Brandon Casey
2008-05-09 16:12                                   ` Nicolas Pitre
2008-05-09 16:54                                     ` Brandon Casey
2008-05-09 22:33                                     ` Junio C Hamano
2008-05-09 23:09                                       ` [PATCH] Updating documentation to match Brandon Casey's proposed git-repack patch Chris Frey
2008-05-10  0:07                                       ` git gc & deleted branches Jeremy Maitin-Shepard
2008-05-10  0:20                                         ` Shawn O. Pearce
2008-05-10  0:43                                           ` Jeremy Maitin-Shepard
2008-05-10  1:21                                           ` Junio C Hamano
2008-05-10  1:51                                             ` Jeremy Maitin-Shepard
2008-05-10  5:25                                               ` Jeff King
2008-05-10  5:36                                                 ` Jeremy Maitin-Shepard
2008-05-10  9:04                                                   ` Johannes Schindelin
2008-05-10 16:24                                                     ` Jeremy Maitin-Shepard
2008-05-11 11:11                                                       ` Johannes Schindelin
2008-05-11 18:39                                                         ` Junio C Hamano
2008-05-08 21:33           ` Guido Ostkamp
2008-05-08 20:56       ` Jeff King
2008-05-08 20:51     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).