git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Cleaning the .git directory with gc
@ 2008-04-23 23:13 Haakon Riiser
  2008-04-24  0:09 ` Russ Dill
  0 siblings, 1 reply; 6+ messages in thread
From: Haakon Riiser @ 2008-04-23 23:13 UTC (permalink / raw)
  To: git

I've recently started using git, and while experimenting with
git commit --amend, I noticed that git gc does not do what I
expected.  Example:

  $ mkdir foo && cd foo
  $ git init
  $ dd if=/dev/urandom bs=1k count=1000 of=rand.dat
  $ git add .
  $ git commit -a -m 'first rev'
  $ du -s .git
  1100    .git

1 MB file checked in, 1 MB repository.  So far, so good.

  $ dd if=/dev/urandom bs=1k count=1000 of=rand.dat
  $ git commit -a -m 'replaced first rev' --amend
  $ du -s .git
  2120    .git

At this point, I expected the --amend command to notice that
the amended commit contains a replacement for the old file,
and thus that the repository didn't grow.  I then figured that
if --amend doesn't do that by itself, git gc surely will:

  $ git gc
  $ du -s .git
  2104    .git

So, why doesn't gc remove the data from the first commit?  Is it
still accessible, even though the log doesn't show it?

Is it possible to actually replace the commit, i.e., to make it
exactly like the first commit never happend at all?  (Without
modifying the repository by hand.)

-- 
 Haakon

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Cleaning the .git directory with gc
  2008-04-23 23:13 Cleaning the .git directory with gc Haakon Riiser
@ 2008-04-24  0:09 ` Russ Dill
  2008-04-24  0:32   ` David Tweed
  2008-04-24  0:50   ` Shawn O. Pearce
  0 siblings, 2 replies; 6+ messages in thread
From: Russ Dill @ 2008-04-24  0:09 UTC (permalink / raw)
  To: Haakon Riiser; +Cc: git

On Wed, Apr 23, 2008 at 4:13 PM, Haakon Riiser <haakon.riiser@fys.uio.no> wrote:
> I've recently started using git, and while experimenting with
>  git commit --amend, I noticed that git gc does not do what I
>  expected.  Example:

Thats a lot of work without first reading the man page:

       --prune
           Usually git-gc packs refs, expires old reflog entries, packs loose
           objects, and removes old rerere records. Removal of unreferenced
           loose objects is an unsafe operation while other git operations are
           in progress, so it is not done by default. Pass this option if you
           want it, and only when you know nobody else is creating new objects
           in the repository at the same time (e.g. never use this option in a
           cron script).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Cleaning the .git directory with gc
  2008-04-24  0:09 ` Russ Dill
@ 2008-04-24  0:32   ` David Tweed
  2008-04-24  0:57     ` Shawn O. Pearce
  2008-04-24  0:50   ` Shawn O. Pearce
  1 sibling, 1 reply; 6+ messages in thread
From: David Tweed @ 2008-04-24  0:32 UTC (permalink / raw)
  To: Russ Dill; +Cc: Haakon Riiser, git

On Thu, Apr 24, 2008 at 1:09 AM, Russ Dill <russ.dill@gmail.com> wrote:
> On Wed, Apr 23, 2008 at 4:13 PM, Haakon Riiser <haakon.riiser@fys.uio.no> wrote:
>  > I've recently started using git, and while experimenting with
>  >  git commit --amend, I noticed that git gc does not do what I
>  >  expected.  Example:
>
>  Thats a lot of work without first reading the man page:
>
>        --prune
[snip]

There's a relatively recent change in this area. Git keeps stuff
that's apparently unattached for a period of, by default, 2 weeks
(determined by gc.pruneexpire variable) after which a git gc will
remove it. The reasoning is that even with the careful design of the
git updating strategy there are rare times when with a concurrent
other git process there are files in the repo that look unattached but
will become attached as the other process completes. Files kept this
way aren't propagated by clones or pulls so they're essentially
invisible to everything else. If you're sure you can force removal
with

git prune --expire now

AFAICS there's no way to call "git gc --prune" with an --expire option
so you've got to use the "git prune" command.

HTH

-- 
cheers, dave tweed__________________________
david.tweed@gmail.com
Rm 124, School of Systems Engineering, University of Reading.
"while having code so boring anyone can maintain it, use Python." --
attempted insult seen on slashdot

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Cleaning the .git directory with gc
  2008-04-24  0:09 ` Russ Dill
  2008-04-24  0:32   ` David Tweed
@ 2008-04-24  0:50   ` Shawn O. Pearce
  2008-04-24 21:14     ` Haakon Riiser
  1 sibling, 1 reply; 6+ messages in thread
From: Shawn O. Pearce @ 2008-04-24  0:50 UTC (permalink / raw)
  To: Russ Dill; +Cc: Haakon Riiser, git

Russ Dill <russ.dill@gmail.com> wrote:
> On Wed, Apr 23, 2008 at 4:13 PM, Haakon Riiser <haakon.riiser@fys.uio.no> wrote:
> > I've recently started using git, and while experimenting with
> >  git commit --amend, I noticed that git gc does not do what I
> >  expected.  Example:
> 
> Thats a lot of work without first reading the man page:
> 
>        --prune
>            Usually git-gc packs refs, expires old reflog entries, packs loose
>            objects, and removes old rerere records. Removal of unreferenced
>            loose objects is an unsafe operation while other git operations are
>            in progress, so it is not done by default. Pass this option if you
>            want it, and only when you know nobody else is creating new objects
>            in the repository at the same time (e.g. never use this option in a
>            cron script).

But even with `git gc --prune` the old commit object will still
be in your repository.

Why?  Both HEAD and your branch's reflog have a reference to the
old commit.  And those will remain in there for 90 days by default,
so that you could always go back and get that if you _really_
had to recover it.  Take a look with `git reflog show HEAD`
or `git log -g` and you'll see what I mean.

A commit is peanuts when it comes to disk space.  Don't worry
about it.  After a lot of amends and such you will be carrying
around only a few extra MBs.  In return for those few extra MBs
you are always able to recovery anything, up to 3 months back.

If you _really_ need to whack all of that away, make a clone
and then discard the old one, e.g.:

	git clone file://`pwd`/old_proj new_proj

Note you need to use the file:// URI syntax to prevent Git from
just hardlinking everything.  It takes a little longer, but the
resulting new_proj will be cruft free.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Cleaning the .git directory with gc
  2008-04-24  0:32   ` David Tweed
@ 2008-04-24  0:57     ` Shawn O. Pearce
  0 siblings, 0 replies; 6+ messages in thread
From: Shawn O. Pearce @ 2008-04-24  0:57 UTC (permalink / raw)
  To: David Tweed; +Cc: Russ Dill, Haakon Riiser, git

David Tweed <david.tweed@gmail.com> wrote:
> On Thu, Apr 24, 2008 at 1:09 AM, Russ Dill <russ.dill@gmail.com> wrote:
> > On Wed, Apr 23, 2008 at 4:13 PM, Haakon Riiser <haakon.riiser@fys.uio.no> wrote:
> >  > I've recently started using git, and while experimenting with
> >  >  git commit --amend, I noticed that git gc does not do what I
> >  >  expected.  Example:
> >
> >  Thats a lot of work without first reading the man page:
> >
> >        --prune
> [snip]
> 
> There's a relatively recent change in this area. Git keeps stuff
> that's apparently unattached for a period of, by default, 2 weeks
> (determined by gc.pruneexpire variable) after which a git gc will
> remove it. The reasoning is that even with the careful design of the
> git updating strategy there are rare times when with a concurrent
> other git process there are files in the repo that look unattached but
> will become attached as the other process completes.

Although that's certainly true, the original poster was asking about
`git commit --amend`.  In such a case the reflog for HEAD and the
current branch are going to anchor the old commit for the reflog
expire period, which is 90 days.  Way longer than the 2 week aging
of loose objects.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Cleaning the .git directory with gc
  2008-04-24  0:50   ` Shawn O. Pearce
@ 2008-04-24 21:14     ` Haakon Riiser
  0 siblings, 0 replies; 6+ messages in thread
From: Haakon Riiser @ 2008-04-24 21:14 UTC (permalink / raw)
  To: git

[Shawn O. Pearce]

> [...] 
> If you _really_ need to whack all of that away, make a clone
> and then discard the old one, e.g.:
> 
> 	git clone file://`pwd`/old_proj new_proj
> 
> Note you need to use the file:// URI syntax to prevent Git from
> just hardlinking everything.  It takes a little longer, but the
> resulting new_proj will be cruft free.

Thanks for answering my question so quickly.  I really wasn't that
worried about it, but seemed strange that the --amend option only
seemed to affect the logs.  Anyway, it's nice to know I can start
with a clean slate if I want to by cloning from a file:// path.

-- 
 Haakon

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-04-24 21:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-23 23:13 Cleaning the .git directory with gc Haakon Riiser
2008-04-24  0:09 ` Russ Dill
2008-04-24  0:32   ` David Tweed
2008-04-24  0:57     ` Shawn O. Pearce
2008-04-24  0:50   ` Shawn O. Pearce
2008-04-24 21:14     ` Haakon Riiser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).