git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Rewriting history with git-filter-branch and leaking objects (?)
@ 2007-08-17 17:18 Mike Hommey
  2007-08-17 17:34 ` David Kastrup
  2007-08-17 18:31 ` Mike Hommey
  0 siblings, 2 replies; 7+ messages in thread
From: Mike Hommey @ 2007-08-17 17:18 UTC (permalink / raw)
  To: git

Hi,

I've been playing with git-filter-branch, and was wondering how objects
from the original branch are supposed to be removed.

It looks like removing the refs/original/* refs is not enough.

And it also looks like when all references seem to be removed, git-prune
doesn't fully do its job...

See the following transcript:

$ git init
Initialized empty Git repository in .git/
$ echo a > a ; echo b > b
$ git add a b
$ git commit -m "add a b"
Created initial commit b8875b1: add a b
 2 files changed, 2 insertions(+), 0 deletions(-)
 create mode 100644 a
 create mode 100644 b
$ echo a >> a
$ git commit -a -m "update a"
Created commit fd97ed9: update a
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git-filter-branch --index-filter 'git-update-index --remove b' HEAD
Rewrite fd97ed9a2fef62eca824361fb62269e3c1fc0fb8 (2/2)
Ref 'refs/heads/master' was rewritten

These refs were rewritten:
fatal: Not a git repository: '/tmp/test/.git-rewrite/t/../../.git'

(This is a message that happens when GIT_DIR is not set, I guess the
patches that were sent a few days ago on the list fix this issue)

$ git-cat-file commit b8875b1
tree 3683f870be446c7cc05ffaef9fa06415276e1828
author Mike Hommey <mh@namakemono.glandium.org> 1187369087 +0200
committer Mike Hommey <mh@namakemono.glandium.org> 1187369087 +0200

add a b

(not a surprise, since we still have the refs/original/refs/heads/master
ref)

$ git-update-ref -d refs/original/refs/heads/master fd97ed9
$ git-cat-file commit b8875b1
tree 3683f870be446c7cc05ffaef9fa06415276e1828
author Mike Hommey <mh@namakemono.glandium.org> 1187369087 +0200
committer Mike Hommey <mh@namakemono.glandium.org> 1187369087 +0200

add a b
$ git fsck

(okay, so it is still here, and obviously still referenced ; it appears
to be referenced in .git/logs/...)

$ rm .git/logs/refs/heads/master 
$ rm .git/logs/HEAD
$ git fsck
dangling commit fd97ed9a2fef62eca824361fb62269e3c1fc0fb8

(finally ! So here is a first question: is there a proper way to clean
this out ? rm of the logs sounds brutal...)

$ git-prune -n
3683f870be446c7cc05ffaef9fa06415276e1828 tree
b8875b1095616c1e7e8f8ffce8ebc172059367ea commit
fd97ed9a2fef62eca824361fb62269e3c1fc0fb8 commit
$ git-cat-file commit fd97ed9a2fef62eca824361fb62269e3c1fc0fb8
tree c1f89248e4b6e47a4529d50d37b0840a14d2efb0
parent b8875b1095616c1e7e8f8ffce8ebc172059367ea
author Mike Hommey <mh@namakemono.glandium.org> 1187369110 +0200
committer Mike Hommey <mh@namakemono.glandium.org> 1187369110 +0200

update a

(Why doesn't prune -n tell me it would remove
c1f89248e4b6e47a4529d50d37b0840a14d2efb0, which it should, AFAIK ?)

Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rewriting history with git-filter-branch and leaking objects (?)
  2007-08-17 17:18 Rewriting history with git-filter-branch and leaking objects (?) Mike Hommey
@ 2007-08-17 17:34 ` David Kastrup
  2007-08-17 17:46   ` Mike Hommey
  2007-08-17 21:11   ` Junio C Hamano
  2007-08-17 18:31 ` Mike Hommey
  1 sibling, 2 replies; 7+ messages in thread
From: David Kastrup @ 2007-08-17 17:34 UTC (permalink / raw)
  To: git

Mike Hommey <mh@glandium.org> writes:

> I've been playing with git-filter-branch, and was wondering how objects
> from the original branch are supposed to be removed.
>
> It looks like removing the refs/original/* refs is not enough.
>
> And it also looks like when all references seem to be removed, git-prune
> doesn't fully do its job...

It is quite quite hard to get rid of objects.  You need to get the
reflogs for the commits and the files expired.

The last time I tried this, I ended up unpacking the packed objects,
calling git-fsck with appropriate options to tell me about
unreferenced objects when ignoring reflogs, and removing the files
manually with xargs and rm.

Probably I was not able to do something reasonably intelligent, but
making git actually _lose_ data/commits/whatever is really, really
hard.  I have messed up my repo structure considerably several times,
and everything is still there, with the reflog telling you how to get
it.

Given how easy it is to shoot oneself in the foot with git, it is not
the worst thing.  But you really have to work if you _mean_ it.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rewriting history with git-filter-branch and leaking objects (?)
  2007-08-17 17:34 ` David Kastrup
@ 2007-08-17 17:46   ` Mike Hommey
  2007-08-19  2:34     ` Sam Vilain
  2007-08-17 21:11   ` Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Mike Hommey @ 2007-08-17 17:46 UTC (permalink / raw)
  To: David Kastrup; +Cc: git

On Fri, Aug 17, 2007 at 07:34:32PM +0200, David Kastrup <dak@gnu.org> wrote:
> Mike Hommey <mh@glandium.org> writes:
> 
> > I've been playing with git-filter-branch, and was wondering how objects
> > from the original branch are supposed to be removed.
> >
> > It looks like removing the refs/original/* refs is not enough.
> >
> > And it also looks like when all references seem to be removed, git-prune
> > doesn't fully do its job...
> 
> It is quite quite hard to get rid of objects.  You need to get the
> reflogs for the commits and the files expired.
> 
> The last time I tried this, I ended up unpacking the packed objects,
> calling git-fsck with appropriate options to tell me about
> unreferenced objects when ignoring reflogs, and removing the files
> manually with xargs and rm.
> 
> Probably I was not able to do something reasonably intelligent, but
> making git actually _lose_ data/commits/whatever is really, really
> hard.  I have messed up my repo structure considerably several times,
> and everything is still there, with the reflog telling you how to get
> it.
> 
> Given how easy it is to shoot oneself in the foot with git, it is not
> the worst thing.  But you really have to work if you _mean_ it.

Well, with the introduction of git-filter-branch, once you have
rewritten your history and validated that everything is okay,
you might mean to remove the original branch...

Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rewriting history with git-filter-branch and leaking objects (?)
  2007-08-17 17:18 Rewriting history with git-filter-branch and leaking objects (?) Mike Hommey
  2007-08-17 17:34 ` David Kastrup
@ 2007-08-17 18:31 ` Mike Hommey
  2007-08-19 19:59   ` Mike Hommey
  1 sibling, 1 reply; 7+ messages in thread
From: Mike Hommey @ 2007-08-17 18:31 UTC (permalink / raw)
  To: git

On Fri, Aug 17, 2007 at 07:18:51PM +0200, Mike Hommey <mh@glandium.org> wrote:
> $ rm .git/logs/refs/heads/master 
> $ rm .git/logs/HEAD

git-reflog expire --expire-unreachable=$(date +%s) --all

is cleaner, but git prune -n still misses a tree.

Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rewriting history with git-filter-branch and leaking objects (?)
  2007-08-17 17:34 ` David Kastrup
  2007-08-17 17:46   ` Mike Hommey
@ 2007-08-17 21:11   ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2007-08-17 21:11 UTC (permalink / raw)
  To: David Kastrup; +Cc: git

David Kastrup <dak@gnu.org> writes:

> Mike Hommey <mh@glandium.org> writes:
>
>> I've been playing with git-filter-branch, and was wondering how objects
>> from the original branch are supposed to be removed.
>>
>> It looks like removing the refs/original/* refs is not enough.
>>
>> And it also looks like when all references seem to be removed, git-prune
>> doesn't fully do its job...
>
> It is quite quite hard to get rid of objects.  You need to get the
> reflogs for the commits and the files expired.

An easier way is probably to make a new clone in the
neighbouring directory locally.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rewriting history with git-filter-branch and leaking objects (?)
  2007-08-17 17:46   ` Mike Hommey
@ 2007-08-19  2:34     ` Sam Vilain
  0 siblings, 0 replies; 7+ messages in thread
From: Sam Vilain @ 2007-08-19  2:34 UTC (permalink / raw)
  To: Mike Hommey; +Cc: David Kastrup, git

Mike Hommey wrote:
> Well, with the introduction of git-filter-branch, once you have
> rewritten your history and validated that everything is okay,
> you might mean to remove the original branch...
>   

Right, but it's probably better to leave a trail.  I tend to use
refs/Attic/* for branches that I've re-written (if I published them). 
That way, nothing will send or receive them by default.  Sometimes I'll
have a "clean" repository which doesn't have any of them for faster
initial cloning.

Sam.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rewriting history with git-filter-branch and leaking objects (?)
  2007-08-17 18:31 ` Mike Hommey
@ 2007-08-19 19:59   ` Mike Hommey
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Hommey @ 2007-08-19 19:59 UTC (permalink / raw)
  To: git

On Fri, Aug 17, 2007 at 08:31:15PM +0200, Mike Hommey <mh@glandium.org> wrote:
> On Fri, Aug 17, 2007 at 07:18:51PM +0200, Mike Hommey <mh@glandium.org> wrote:
> > $ rm .git/logs/refs/heads/master 
> > $ rm .git/logs/HEAD
> 
> git-reflog expire --expire-unreachable=$(date +%s) --all
> 
> is cleaner, but git prune -n still misses a tree.

FWIW, I found out what was happening, and why there was still a tree not
being pruned: it was used by the index.

So all in all, no spurious object after cleanup.

Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-08-19 19:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-17 17:18 Rewriting history with git-filter-branch and leaking objects (?) Mike Hommey
2007-08-17 17:34 ` David Kastrup
2007-08-17 17:46   ` Mike Hommey
2007-08-19  2:34     ` Sam Vilain
2007-08-17 21:11   ` Junio C Hamano
2007-08-17 18:31 ` Mike Hommey
2007-08-19 19:59   ` Mike Hommey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).