git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* how to remove unreachable objects?
@ 2011-09-19  9:08 dieter
  2011-09-19 19:53 ` Jeff King
  2011-09-19 20:36 ` Andreas Schwab
  0 siblings, 2 replies; 9+ messages in thread
From: dieter @ 2011-09-19  9:08 UTC (permalink / raw)
  To: git

hi,

i am relatively new to git, and am currently trying to get used to it.

at the moment i am exploring how to remove unneeded objects, this
should be possible with prune, gc and/or fsck.
maybe i have not found the right combination or something in my
understand is not correct.

this is my use case:
i create a repository and produce several commits on master.
then i go back to a certain tag and create a new branch, where i also
commit.
then i switch back to master and delete (-D) the other branch.
it should now be unreachable from within git (to prove its existence,
i remember a commit SHA1 on the dead branch).
then i try to get rid of the unreachable objects with a series of
prune, fsck and gc.

-------------
schoen.d@ax:~/projects/gitFeatures$ cat mk_dead_end.sh
#!/bin/sh

DEAD=dead_end

rm -rf $DEAD
mkdir $DEAD
cd $DEAD
git init
echo "first commit" > A
git add A
git commit -m "first commit"
git tag first_commit
echo "second commit" >> A
git add A
git commit -m "second commit"
git checkout first_commit
echo "commit in dead end" >> A
git add A
git commit -m "changed A in dead end"
git checkout -b $DEAD
dead_commit=`git log -1 --format="%H"`
git checkout master
git branch -D $DEAD
git show $dead_commit
git fsck --unreachable --full --verbose
git fsck --unreachable HEAD \
                    $(git for-each-ref --format="%(objectname)" refs/heads)
git fsck --lost-found
git prune -v $dead_commit
git prune $(git rev-parse --all)
git repack
git prune-packed
git gc --prune=now
git gc --aggressive
git show $dead_commit


------
if you look at the output of this script then you see that git knows  
that there
are unreachable/dangling objects, but they remain.

thankful for any pointer,
dieter

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: how to remove unreachable objects?
  2011-09-19  9:08 how to remove unreachable objects? dieter
@ 2011-09-19 19:53 ` Jeff King
  2011-09-19 20:18   ` Jeff King
  2011-09-24 22:11   ` Dieter Schön
  2011-09-19 20:36 ` Andreas Schwab
  1 sibling, 2 replies; 9+ messages in thread
From: Jeff King @ 2011-09-19 19:53 UTC (permalink / raw)
  To: dieter; +Cc: git

On Mon, Sep 19, 2011 at 11:08:31AM +0200, dieter@schoen.or.at wrote:

> this is my use case:
> i create a repository and produce several commits on master.
> then i go back to a certain tag and create a new branch, where i also
> commit.
> then i switch back to master and delete (-D) the other branch.
> it should now be unreachable from within git (to prove its existence,
> i remember a commit SHA1 on the dead branch).

It will still be referenced by the HEAD reflog, won't it?

> git checkout master
> git branch -D $DEAD
> git show $dead_commit
> git fsck --unreachable --full --verbose

This shows it reachable, because it is connected from the HEAD reflog.

> git fsck --unreachable HEAD \
>                     $(git for-each-ref --format="%(objectname)" refs/heads)

And this shows it as unreachable, because you are asking git to only
look at the branch tips and HEAD (by default, it looks at all refs and
reflogs).

I suspect you copied this straight from the git-fsck manpage. That
advice is a bit outdated, I think. It blames (in some form) all the way
back to the original documentation added in c64b9b8 (2005-05-05, only a
few weeks after git was born). A few weeks later, fsck learned to
default to looking at all refs (1024932, 2005-05-18). And then other
sane defaults like reflogs got tacked on later (reflogs came around the
1.4.x era, in 2006).

> git fsck --lost-found
> git prune -v $dead_commit
> git prune $(git rev-parse --all)
> git repack
> git prune-packed
> git gc --prune=now
> git gc --aggressive
> git show $dead_commit

If you really want to make it unreachable, you should expire the
reflogs, too:

  git reflog expire --expire=now --all
  # will now report unreachable
  git fsck --unreachable
  # will now actually delete objects
  git prune -v
  # gives "bad object ..."
  git show $dead_commit

git-gc will do this for you, but of course the default expiration time
is much longer (I think something like 90 days).

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: how to remove unreachable objects?
  2011-09-19 19:53 ` Jeff King
@ 2011-09-19 20:18   ` Jeff King
  2011-09-19 21:38     ` Junio C Hamano
  2011-09-24 22:11   ` Dieter Schön
  1 sibling, 1 reply; 9+ messages in thread
From: Jeff King @ 2011-09-19 20:18 UTC (permalink / raw)
  To: dieter; +Cc: Junio C Hamano, git

On Mon, Sep 19, 2011 at 03:53:36PM -0400, Jeff King wrote:

> > git fsck --unreachable HEAD \
> >                     $(git for-each-ref --format="%(objectname)" refs/heads)
> 
> And this shows it as unreachable, because you are asking git to only
> look at the branch tips and HEAD (by default, it looks at all refs and
> reflogs).
> 
> I suspect you copied this straight from the git-fsck manpage. That
> advice is a bit outdated, I think. It blames (in some form) all the way
> back to the original documentation added in c64b9b8 (2005-05-05, only a
> few weeks after git was born). A few weeks later, fsck learned to
> default to looking at all refs (1024932, 2005-05-18). And then other
> sane defaults like reflogs got tacked on later (reflogs came around the
> 1.4.x era, in 2006).

So we should probably do something like this:

-- >8 --
Subject: [PATCH] docs: brush up obsolete bits of git-fsck manpage

After the description and options, the fsck manpage contains
some discussion about what it does. Over time, this
discussion has become somewhat obsolete, both in content and
formatting. In particular:

  1. There are many options now, so starting the discussion
     with "It tests..." makes it unclear whether we are
     talking about the last option, or about the tool in
     general. Let's start a new "discussion" section and
     make our antecedent more clear.

  2. It gave an example for --unreachable using for-each-ref
     to mention all of the heads, saying that it will do "a
     _lot_ of verification". This is hopelessly out-of-date,
     as giving no arguments will check much more (reflogs,
     the index, non-head refs).

  3. It goes on to mention tests "to be added" (like tree
     object sorting). We now have these tests.

Signed-off-by: Jeff King <peff@peff.net>
---
I was tempted to just drop this section entirely. It's mostly redundant
with the DESCRIPTION section, and any extra details could be folded in
there. The most useful bit is the "what do you do when there is
corruption". But that should perhaps get its own section, if somebody
feels like writing something more detailed (I thought we had a guide
somewhere, but I couldn't find it).

 Documentation/git-fsck.txt |   26 ++++++++------------------
 1 files changed, 8 insertions(+), 18 deletions(-)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index a2a508d..55b33d7 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -72,30 +72,20 @@ index file, all SHA1 references in .git/refs/*, and all reflogs (unless
 	a blob, the contents are written into the file, rather than
 	its object name.
 
-It tests SHA1 and general object sanity, and it does full tracking of
-the resulting reachability and everything else. It prints out any
-corruption it finds (missing or bad objects), and if you use the
-'--unreachable' flag it will also print out objects that exist but
-that aren't reachable from any of the specified head nodes.
-
-So for example
-
-	git fsck --unreachable HEAD \
-		$(git for-each-ref --format="%(objectname)" refs/heads)
+DISCUSSION
+----------
 
-will do quite a _lot_ of verification on the tree. There are a few
-extra validity tests to be added (make sure that tree objects are
-sorted properly etc), but on the whole if 'git fsck' is happy, you
-do have a valid tree.
+git-fsck tests SHA1 and general object sanity, and it does full tracking
+of the resulting reachability and everything else. It prints out any
+corruption it finds (missing or bad objects), and if you use the
+'--unreachable' flag it will also print out objects that exist but that
+aren't reachable from any of the specified head nodes (or the default
+set, as mentioned above).
 
 Any corrupt objects you will have to find in backups or other archives
 (i.e., you can just remove them and do an 'rsync' with some other site in
 the hopes that somebody else has the object you have corrupted).
 
-Of course, "valid tree" doesn't mean that it wasn't generated by some
-evil person, and the end result might be crap. git is a revision
-tracking system, not a quality assurance system ;)
-
 Extracted Diagnostics
 ---------------------
 
-- 
1.7.7.rc1.3.gb95be

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: how to remove unreachable objects?
  2011-09-19  9:08 how to remove unreachable objects? dieter
  2011-09-19 19:53 ` Jeff King
@ 2011-09-19 20:36 ` Andreas Schwab
  1 sibling, 0 replies; 9+ messages in thread
From: Andreas Schwab @ 2011-09-19 20:36 UTC (permalink / raw)
  To: dieter; +Cc: git

dieter@schoen.or.at writes:

> if you look at the output of this script then you see that git knows  
> that there
> are unreachable/dangling objects, but they remain.

You also need to prune the reflog of HEAD.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: how to remove unreachable objects?
  2011-09-19 20:18   ` Jeff King
@ 2011-09-19 21:38     ` Junio C Hamano
  2011-09-19 22:52       ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2011-09-19 21:38 UTC (permalink / raw)
  To: Jeff King; +Cc: dieter, git

Jeff King <peff@peff.net> writes:

> I was tempted to just drop this section entirely. It's mostly redundant
> with the DESCRIPTION section, and any extra details could be folded in
> there. The most useful bit is the "what do you do when there is
> corruption". But that should perhaps get its own section, if somebody
> feels like writing something more detailed (I thought we had a guide
> somewhere, but I couldn't find it).

Yeah, I've been thinking about making it an error to give refs to fsck, as
I do not think the use cases for feature justifies the possible confusion
it may cause.

One possible use case might be when your repository is corrupt, and does
not pass "git fsck" (without any argument).  In such a case, if you are
lucky and your disk corrupted objects only reachable from a recent topic
branch, you might find that this command:

	$ git fsck master next ...list other topics here...

still succeeds, so that you can figure out which topic makes such a
limited fsck fail when it is listed on the command line, judge its
importance and resurrect what you can from there, before nuking it to
bring the repository back in health so that you can recreate the topic.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: how to remove unreachable objects?
  2011-09-19 21:38     ` Junio C Hamano
@ 2011-09-19 22:52       ` Jeff King
  2011-09-20  0:40         ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2011-09-19 22:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: dieter, git

On Mon, Sep 19, 2011 at 02:38:41PM -0700, Junio C Hamano wrote:

> Yeah, I've been thinking about making it an error to give refs to fsck, as
> I do not think the use cases for feature justifies the possible confusion
> it may cause.
>
> One possible use case might be when your repository is corrupt, and does
> not pass "git fsck" (without any argument).  In such a case, if you are
> lucky and your disk corrupted objects only reachable from a recent topic
> branch, you might find that this command:
> 
> 	$ git fsck master next ...list other topics here...
> 
> still succeeds, so that you can figure out which topic makes such a
> limited fsck fail when it is listed on the command line, judge its
> importance and resurrect what you can from there, before nuking it to
> bring the repository back in health so that you can recreate the topic.

Does that work? I had the impression from the documentation that the
arguments are purely about the reachability analysis, and that the
actual corruption/correctness checks actually look through the object db
directly, making sure each object is well-formed. Skimming cmd_fsck
seems to confirm that.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: how to remove unreachable objects?
  2011-09-19 22:52       ` Jeff King
@ 2011-09-20  0:40         ` Junio C Hamano
  2011-09-20  0:51           ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2011-09-20  0:40 UTC (permalink / raw)
  To: Jeff King; +Cc: dieter, git

Jeff King <peff@peff.net> writes:

>> One possible use case might be when your repository is corrupt, and does
>> not pass "git fsck" (without any argument).  In such a case, if you are
>> lucky and your disk corrupted objects only reachable from a recent topic
>> branch, you might find that this command:
>> 
>> 	$ git fsck master next ...list other topics here...
>> 
>> still succeeds, so that you can figure out which topic makes such a
>> limited fsck fail when it is listed on the command line, judge its
>> importance and resurrect what you can from there, before nuking it to
>> bring the repository back in health so that you can recreate the topic.
>
> Does that work? I had the impression from the documentation that the
> arguments are purely about the reachability analysis, and that the
> actual corruption/correctness checks actually look through the object db
> directly, making sure each object is well-formed. Skimming cmd_fsck
> seems to confirm that.

You are right that you may see "corrupt object" for unreachable from the
tips in the object store, but I was talking more about verifying
everything that is needed for reachability analysis from the given tips
can be read, iow, "missing object" errors, lack of which would mean you
can salvage everything reachable from the refs involved in the traversal.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: how to remove unreachable objects?
  2011-09-20  0:40         ` Junio C Hamano
@ 2011-09-20  0:51           ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2011-09-20  0:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: dieter, git

On Mon, Sep 19, 2011 at 05:40:03PM -0700, Junio C Hamano wrote:

> > Does that work? I had the impression from the documentation that the
> > arguments are purely about the reachability analysis, and that the
> > actual corruption/correctness checks actually look through the object db
> > directly, making sure each object is well-formed. Skimming cmd_fsck
> > seems to confirm that.
> 
> You are right that you may see "corrupt object" for unreachable from the
> tips in the object store, but I was talking more about verifying
> everything that is needed for reachability analysis from the given tips
> can be read, iow, "missing object" errors, lack of which would mean you
> can salvage everything reachable from the refs involved in the traversal.

True. Though one could also do that with "git log", and it would be much
cheaper (since each trial you run with git-fsck is going to actually
fsck the object db, which is expensive).

I can't help but think the right solution there is something like:

  1. If the corrupted or missing object is a blob or tree, figure out
     which commits reference it with something like:

       a. Create a set B of bad objects (blobs or trees).

       b. For each tree in the object db, open and see if it contains
          any elements of B. If so, add the tree to another set, B'.

       c. If B' is empty, done. Otherwise, add elements from B' to B and
          goto step (b).

       d. For each commit in the object db, open and check the tree
          pointer. If it points to an element of B, then the commit is
          bad.

  2. If the object is a commit, or if you arrived at a set of bad
     commits through step (1), then use "branch --contains" on the
     bad commits.

which is algorithmically efficient (though probably slow if you had to
cat-file each tree). It might be a handy special command, though (I have
seen people ask for "which part of history references this blob" on
occasion). I've never bothered writing it because I've never had a
corrupt object. :)

Anyway, that is perhaps not relevant to your point. But I do think that
fsck with arguments is more likely to confuse someone than to actually
be part of a productive use-case. I have no problem with deprecating or
removing it.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: how to remove unreachable objects?
  2011-09-19 19:53 ` Jeff King
  2011-09-19 20:18   ` Jeff King
@ 2011-09-24 22:11   ` Dieter Schön
  1 sibling, 0 replies; 9+ messages in thread
From: Dieter Schön @ 2011-09-24 22:11 UTC (permalink / raw)
  To: git


Am 19.09.2011 um 21:53 schrieb Jeff King:

> On Mon, Sep 19, 2011 at 11:08:31AM +0200, dieter@schoen.or.at wrote:
> 
>> this is my use case:
>> i create a repository and produce several commits on master.
>> then i go back to a certain tag and create a new branch, where i also
>> commit.
>> then i switch back to master and delete (-D) the other branch.
>> it should now be unreachable from within git (to prove its existence,
>> i remember a commit SHA1 on the dead branch).
> 
> It will still be referenced by the HEAD reflog, won't it?

thanks to all that answered!
it was very helpful and i gained a bit more insight.

kind regards,
dieter

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-09-24 22:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-19  9:08 how to remove unreachable objects? dieter
2011-09-19 19:53 ` Jeff King
2011-09-19 20:18   ` Jeff King
2011-09-19 21:38     ` Junio C Hamano
2011-09-19 22:52       ` Jeff King
2011-09-20  0:40         ` Junio C Hamano
2011-09-20  0:51           ` Jeff King
2011-09-24 22:11   ` Dieter Schön
2011-09-19 20:36 ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).