git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How to remove a commit object?
@ 2008-09-18 23:41 Steven Grimm
  2008-09-19  9:16 ` Michael J Gruber
  0 siblings, 1 reply; 11+ messages in thread
From: Steven Grimm @ 2008-09-18 23:41 UTC (permalink / raw)
  To: Git Users List

I maintain a shared repository a bunch of my coworkers push to for  
code reviews. It has accumulated a lot of packfiles, so I want to  
shrink it down a bit, but there's a problem:

% git repack -A -d
Counting objects: ...
error: Could not read 125bf191b65189aaec7a6aa24ff26460d141d587
fatal: bad tree object 125bf191b65189aaec7a6aa24ff26460d141d587

"git fsck" confirms that the tree object is missing:

% git fsck
broken link from  commit 1b2f0595bb4a6c2e17ca43a9cc41feec88c72a47
               to    tree 125bf191b65189aaec7a6aa24ff26460d141d587
...
missing tree 125bf191b65189aaec7a6aa24ff26460d141d587

This is a dangling commit, but that's fine; for this particular  
repository we actually *want* lots of dangling commits since they  
represent the history of people's code review requests. (Hence me  
running git-repack with -A instead of -a.)

Given that it's dangling, it seems like it'd be safe to just remove  
entirely (we lose that little bit of code-review history but we've  
lost it already anyway with the tree object missing). But I'm not sure  
how to do it. Is it possible to delete a commit object, and if so, how?

I don't know how the corruption happened in the first place. There was  
a short time at one point where the permissions on the object  
directories were inconsistent, so it's possible someone pushed during  
that period and managed to create the commit object file in .git/ 
objects but didn't have permission to create the tree object. That's  
just speculation on my part, though. This is the only corrupt object  
in the repository according to git-fsck, so at this point I just want  
to know how to get rid of it so I can do the repack.

Thanks!

-Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to remove a commit object?
  2008-09-18 23:41 How to remove a commit object? Steven Grimm
@ 2008-09-19  9:16 ` Michael J Gruber
  2008-10-02 13:36   ` Klas Lindberg
  0 siblings, 1 reply; 11+ messages in thread
From: Michael J Gruber @ 2008-09-19  9:16 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Git Users List

Steven Grimm venit, vidit, dixit 19.09.2008 01:41:
> I maintain a shared repository a bunch of my coworkers push to for  
> code reviews. It has accumulated a lot of packfiles, so I want to  
> shrink it down a bit, but there's a problem:
> 
> % git repack -A -d
> Counting objects: ...
> error: Could not read 125bf191b65189aaec7a6aa24ff26460d141d587
> fatal: bad tree object 125bf191b65189aaec7a6aa24ff26460d141d587
> 
> "git fsck" confirms that the tree object is missing:
> 
> % git fsck
> broken link from  commit 1b2f0595bb4a6c2e17ca43a9cc41feec88c72a47
>                to    tree 125bf191b65189aaec7a6aa24ff26460d141d587
> ...
> missing tree 125bf191b65189aaec7a6aa24ff26460d141d587
> 
> This is a dangling commit, but that's fine; for this particular  
> repository we actually *want* lots of dangling commits since they  
> represent the history of people's code review requests. (Hence me  
> running git-repack with -A instead of -a.)
> 
> Given that it's dangling, it seems like it'd be safe to just remove  
> entirely (we lose that little bit of code-review history but we've  
> lost it already anyway with the tree object missing). But I'm not sure  
> how to do it. Is it possible to delete a commit object, and if so, how?
> 
> I don't know how the corruption happened in the first place. There was  
> a short time at one point where the permissions on the object  
> directories were inconsistent, so it's possible someone pushed during  
> that period and managed to create the commit object file in .git/ 
> objects but didn't have permission to create the tree object. That's  
> just speculation on my part, though. This is the only corrupt object  
> in the repository according to git-fsck, so at this point I just want  
> to know how to get rid of it so I can do the repack.

git prune should delete dangling commits. Is that commit already in a
pack? Then the -f option to repack may help.

Michael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to remove a commit object?
  2008-09-19  9:16 ` Michael J Gruber
@ 2008-10-02 13:36   ` Klas Lindberg
  2008-10-02 14:00     ` Michael J Gruber
  2008-10-02 14:02     ` Jakub Narebski
  0 siblings, 2 replies; 11+ messages in thread
From: Klas Lindberg @ 2008-10-02 13:36 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Steven Grimm, Git Users List

This doesn't seem to work for me. I will soon be in a situation where
I need to selectively delete commits in such a way that they become
completely irrecoverable. I.e. it is not enough to revert a commit.
The *original* commit must be removed. And of course, the repo history
is too complex to allow for rebasing followed by garbage collection or
something like that.

The reason is that we consider opening a repository to external
participants, but some commits contain stuff that we'd really rather
not show to anyone else. Making the repository public without loosing
history would then force us to either

 1. Recreate every commit in a new repo, sans the offending commits.
Seems like hard work.
 2. ?

Would it be feasible to write a tool that can selectively replace a
specific commit in the commit DAG, or would that automatically
invalidate every SHA key for every commit that follows the replaced
original?

BR / Klas

On Fri, Sep 19, 2008 at 11:16 AM, Michael J Gruber
<git@drmicha.warpmail.net> wrote:
> Steven Grimm venit, vidit, dixit 19.09.2008 01:41:
>> I maintain a shared repository a bunch of my coworkers push to for
>> code reviews. It has accumulated a lot of packfiles, so I want to
>> shrink it down a bit, but there's a problem:
>>
>> % git repack -A -d
>> Counting objects: ...
>> error: Could not read 125bf191b65189aaec7a6aa24ff26460d141d587
>> fatal: bad tree object 125bf191b65189aaec7a6aa24ff26460d141d587
>>
>> "git fsck" confirms that the tree object is missing:
>>
>> % git fsck
>> broken link from  commit 1b2f0595bb4a6c2e17ca43a9cc41feec88c72a47
>>                to    tree 125bf191b65189aaec7a6aa24ff26460d141d587
>> ...
>> missing tree 125bf191b65189aaec7a6aa24ff26460d141d587
>>
>> This is a dangling commit, but that's fine; for this particular
>> repository we actually *want* lots of dangling commits since they
>> represent the history of people's code review requests. (Hence me
>> running git-repack with -A instead of -a.)
>>
>> Given that it's dangling, it seems like it'd be safe to just remove
>> entirely (we lose that little bit of code-review history but we've
>> lost it already anyway with the tree object missing). But I'm not sure
>> how to do it. Is it possible to delete a commit object, and if so, how?
>>
>> I don't know how the corruption happened in the first place. There was
>> a short time at one point where the permissions on the object
>> directories were inconsistent, so it's possible someone pushed during
>> that period and managed to create the commit object file in .git/
>> objects but didn't have permission to create the tree object. That's
>> just speculation on my part, though. This is the only corrupt object
>> in the repository according to git-fsck, so at this point I just want
>> to know how to get rid of it so I can do the repack.
>
> git prune should delete dangling commits. Is that commit already in a
> pack? Then the -f option to repack may help.
>
> Michael
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to remove a commit object?
  2008-10-02 13:36   ` Klas Lindberg
@ 2008-10-02 14:00     ` Michael J Gruber
  2008-10-02 14:02     ` Jakub Narebski
  1 sibling, 0 replies; 11+ messages in thread
From: Michael J Gruber @ 2008-10-02 14:00 UTC (permalink / raw)
  To: Klas Lindberg; +Cc: Steven Grimm, Git Users List

Klas Lindberg venit, vidit, dixit 02.10.2008 15:36:
> This doesn't seem to work for me. I will soon be in a situation where
> I need to selectively delete commits in such a way that they become
> completely irrecoverable. I.e. it is not enough to revert a commit.
> The *original* commit must be removed. And of course, the repo history
> is too complex to allow for rebasing followed by garbage collection or
> something like that.
> 
> The reason is that we consider opening a repository to external
> participants, but some commits contain stuff that we'd really rather
> not show to anyone else. Making the repository public without loosing
> history would then force us to either
> 
>  1. Recreate every commit in a new repo, sans the offending commits.
> Seems like hard work.
>  2. ?
> 
> Would it be feasible to write a tool that can selectively replace a
> specific commit in the commit DAG, or would that automatically
> invalidate every SHA key for every commit that follows the replaced
> original?

Yes, on the or part: If you change a commit then all commits "after"
that one (in terms of DAG connectedness) will need to be changed: each
contains a "backpointer" (to the parent commit(s)) which is changed.

I'm a bit confused: You rule out rebasing but don't mind recreating a
new repo. So repo size is not a problem, is it?

Michael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to remove a commit object?
  2008-10-02 13:36   ` Klas Lindberg
  2008-10-02 14:00     ` Michael J Gruber
@ 2008-10-02 14:02     ` Jakub Narebski
  2008-10-02 14:26       ` Klas Lindberg
  1 sibling, 1 reply; 11+ messages in thread
From: Jakub Narebski @ 2008-10-02 14:02 UTC (permalink / raw)
  To: Klas Lindberg; +Cc: Michael J Gruber, Steven Grimm, Git Users List

"Klas Lindberg" <klas.lindberg@gmail.com> writes:

> This doesn't seem to work for me. I will soon be in a situation where
> I need to selectively delete commits in such a way that they become
> completely irrecoverable. I.e. it is not enough to revert a commit.
> The *original* commit must be removed. And of course, the repo history
> is too complex to allow for rebasing followed by garbage collection or
> something like that.
[...]

> Would it be feasible to write a tool that can selectively replace a
> specific commit in the commit DAG, or would that automatically
> invalidate every SHA key for every commit that follows the replaced
> original?

It would invalidate SHA1 for every commit after first rewritten.
There are two tools which you can use to rewrite large parts of
history automatically: git-filter-branch, and git-fast-export +
git-fast-import.

HTH
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to remove a commit object?
  2008-10-02 14:02     ` Jakub Narebski
@ 2008-10-02 14:26       ` Klas Lindberg
  2008-10-02 14:30         ` Michael J Gruber
  2008-10-02 15:02         ` Johannes Sixt
  0 siblings, 2 replies; 11+ messages in thread
From: Klas Lindberg @ 2008-10-02 14:26 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Michael J Gruber, Steven Grimm, Git Users List

Repo size is a problem too, actually.

A solution to both problems seemed to be to use git-filter-branch to
create a new repo by filtering out all the unwanted files. The
astonishing result was that, for the subdirectory I tried it on, 90%
or so of the commits on that subdirectory just disappeared. It didn't
look right at all. Although I can't say for sure exactly what I did
with filter-branch, I would appreciate some guidance for using it. It
basically seemed to do exactly what I wanted (recreate the repo, minus
some explicit stuff, with history intact otherwise), except the result
looked crazy.

/Klas

On Thu, Oct 2, 2008 at 4:02 PM, Jakub Narebski <jnareb@gmail.com> wrote:
> "Klas Lindberg" <klas.lindberg@gmail.com> writes:
>
>> This doesn't seem to work for me. I will soon be in a situation where
>> I need to selectively delete commits in such a way that they become
>> completely irrecoverable. I.e. it is not enough to revert a commit.
>> The *original* commit must be removed. And of course, the repo history
>> is too complex to allow for rebasing followed by garbage collection or
>> something like that.
> [...]
>
>> Would it be feasible to write a tool that can selectively replace a
>> specific commit in the commit DAG, or would that automatically
>> invalidate every SHA key for every commit that follows the replaced
>> original?
>
> It would invalidate SHA1 for every commit after first rewritten.
> There are two tools which you can use to rewrite large parts of
> history automatically: git-filter-branch, and git-fast-export +
> git-fast-import.
>
> HTH
> --
> Jakub Narebski
> Poland
> ShadeHawk on #git
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to remove a commit object?
  2008-10-02 14:26       ` Klas Lindberg
@ 2008-10-02 14:30         ` Michael J Gruber
  2008-10-02 14:52           ` Klas Lindberg
  2008-10-02 15:02         ` Johannes Sixt
  1 sibling, 1 reply; 11+ messages in thread
From: Michael J Gruber @ 2008-10-02 14:30 UTC (permalink / raw)
  To: Klas Lindberg; +Cc: Jakub Narebski, Steven Grimm, Git Users List

Klas Lindberg venit, vidit, dixit 02.10.2008 16:26:
> Repo size is a problem too, actually.
> 
> A solution to both problems seemed to be to use git-filter-branch to
> create a new repo by filtering out all the unwanted files. The
> astonishing result was that, for the subdirectory I tried it on, 90%
> or so of the commits on that subdirectory just disappeared. It didn't
> look right at all. Although I can't say for sure exactly what I did
> with filter-branch, I would appreciate some guidance for using it. It

I don't know about others, but I would appreciate more info:
Do you want to remove commits (as stated earlier) or files (as stated here)?
What are the boundary conditions? Rewriting history seems to be OK now.

> basically seemed to do exactly what I wanted (recreate the repo, minus
> some explicit stuff, with history intact otherwise), except the result
> looked crazy.

That may be due to the filter-branch incarnation, i.e. which refs did
you rewrite (--all or HEAD)?

Michael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to remove a commit object?
  2008-10-02 14:30         ` Michael J Gruber
@ 2008-10-02 14:52           ` Klas Lindberg
  0 siblings, 0 replies; 11+ messages in thread
From: Klas Lindberg @ 2008-10-02 14:52 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Jakub Narebski, Steven Grimm, Git Users List

What I really want is to remove files, but when filter-branch didn't
seem to do what I wanted, I turned to the idea of rewriting single
commits to not include the files in question.

This is what I tried with filter-branch: gitk --all shows about 170
commits on directory D in the repo. Of these, maybe 10 don't lead to
HEAD, but dangle off the main track. As a test, I decided to let
filter-branch create a new repo that only contained the contents of
subdirectory B. So I ran

    git-filter-branch --subdirectory-filter B -- --all

and now the resulting repo has just 14 commits. This is clearly not
what I wanted because a lot of the original history for subdirectory B
is just missing.

Actually, in this particular case I get the exact same result with

    git-filter-branch --subdirectory-filter B HEAD

BR / Klas

On Thu, Oct 2, 2008 at 4:30 PM, Michael J Gruber
<git@drmicha.warpmail.net> wrote:
> Klas Lindberg venit, vidit, dixit 02.10.2008 16:26:
>> Repo size is a problem too, actually.
>>
>> A solution to both problems seemed to be to use git-filter-branch to
>> create a new repo by filtering out all the unwanted files. The
>> astonishing result was that, for the subdirectory I tried it on, 90%
>> or so of the commits on that subdirectory just disappeared. It didn't
>> look right at all. Although I can't say for sure exactly what I did
>> with filter-branch, I would appreciate some guidance for using it. It
>
> I don't know about others, but I would appreciate more info:
> Do you want to remove commits (as stated earlier) or files (as stated here)?
> What are the boundary conditions? Rewriting history seems to be OK now.
>
>> basically seemed to do exactly what I wanted (recreate the repo, minus
>> some explicit stuff, with history intact otherwise), except the result
>> looked crazy.
>
> That may be due to the filter-branch incarnation, i.e. which refs did
> you rewrite (--all or HEAD)?
>
> Michael
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to remove a commit object?
  2008-10-02 14:26       ` Klas Lindberg
  2008-10-02 14:30         ` Michael J Gruber
@ 2008-10-02 15:02         ` Johannes Sixt
  2008-10-03 11:42           ` Klas Lindberg
  1 sibling, 1 reply; 11+ messages in thread
From: Johannes Sixt @ 2008-10-02 15:02 UTC (permalink / raw)
  To: Klas Lindberg
  Cc: Jakub Narebski, Michael J Gruber, Steven Grimm, Git Users List

Klas Lindberg schrieb:
> A solution to both problems seemed to be to use git-filter-branch to
> create a new repo by filtering out all the unwanted files. The
> astonishing result was that, for the subdirectory I tried it on, 90%
> or so of the commits on that subdirectory just disappeared. It didn't
> look right at all. Although I can't say for sure exactly what I did
> with filter-branch, I would appreciate some guidance for using it. It
> basically seemed to do exactly what I wanted (recreate the repo, minus
> some explicit stuff, with history intact otherwise), except the result
> looked crazy.

And your definition of 'crazy' is...?

I assume that you used --subdirectory-filter. This has issues that will be
fixed in 1.6.1. You need a current 'master' git (at least b805ef08).

-- Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to remove a commit object?
  2008-10-02 15:02         ` Johannes Sixt
@ 2008-10-03 11:42           ` Klas Lindberg
  2008-10-03 12:03             ` Johannes Sixt
  0 siblings, 1 reply; 11+ messages in thread
From: Klas Lindberg @ 2008-10-03 11:42 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Jakub Narebski, Michael J Gruber, Steven Grimm, Git Users List

On Thu, Oct 2, 2008 at 5:02 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
>> with filter-branch, I would appreciate some guidance for using it. It
>> basically seemed to do exactly what I wanted (recreate the repo, minus
>> some explicit stuff, with history intact otherwise), except the result
>> looked crazy.
>
> And your definition of 'crazy' is...?

Right... :-)
Crazy ==  Obviously incorrect behaviour that I didn't analyze. Out of
167 commits on subdirectory B, only 14 survived the filtering.

I tried "git filter-branch --tree-filter 'rm -rf <list of everything
except B>' HEAD" instead, but I can't use that. The change history for
all the non-B paths are still in the repo afterwards, and thus you can
easily recreate any file outside subdirectory B.

Is there some way to do what I need with git-filter-branch today, or
must I wait until 1.6.1 is released?

BR / Klas


> I assume that you used --subdirectory-filter. This has issues that will be
> fixed in 1.6.1. You need a current 'master' git (at least b805ef08).
>
> -- Hannes
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to remove a commit object?
  2008-10-03 11:42           ` Klas Lindberg
@ 2008-10-03 12:03             ` Johannes Sixt
  0 siblings, 0 replies; 11+ messages in thread
From: Johannes Sixt @ 2008-10-03 12:03 UTC (permalink / raw)
  To: Klas Lindberg
  Cc: Jakub Narebski, Michael J Gruber, Steven Grimm, Git Users List

Klas Lindberg schrieb:
> On Thu, Oct 2, 2008 at 5:02 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
>> I assume that you used --subdirectory-filter. This has issues that will be
>> fixed in 1.6.1. You need a current 'master' git (at least b805ef08).
>
> Is there some way to do what I need with git-filter-branch today, or
> must I wait until 1.6.1 is released?

You can remove all occurences of the "--full-history" flag from your
/usr/libexec/git-core/git-filter-branch script. This is sufficient for
some repositories because this triggers the bug less often. This means
that the resulting history may still be incorrect, but chances are higher
that it is correct.

Other than that, you can just clone git.git and compile it yourself. It's
a simple matter of "make prefix=$HOME/mytempgit install".

-- Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-10-03 12:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-18 23:41 How to remove a commit object? Steven Grimm
2008-09-19  9:16 ` Michael J Gruber
2008-10-02 13:36   ` Klas Lindberg
2008-10-02 14:00     ` Michael J Gruber
2008-10-02 14:02     ` Jakub Narebski
2008-10-02 14:26       ` Klas Lindberg
2008-10-02 14:30         ` Michael J Gruber
2008-10-02 14:52           ` Klas Lindberg
2008-10-02 15:02         ` Johannes Sixt
2008-10-03 11:42           ` Klas Lindberg
2008-10-03 12:03             ` Johannes Sixt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).