* how to remove unreachable objects? @ 2011-09-19 9:08 dieter 2011-09-19 19:53 ` Jeff King 2011-09-19 20:36 ` Andreas Schwab 0 siblings, 2 replies; 9+ messages in thread From: dieter @ 2011-09-19 9:08 UTC (permalink / raw) To: git hi, i am relatively new to git, and am currently trying to get used to it. at the moment i am exploring how to remove unneeded objects, this should be possible with prune, gc and/or fsck. maybe i have not found the right combination or something in my understand is not correct. this is my use case: i create a repository and produce several commits on master. then i go back to a certain tag and create a new branch, where i also commit. then i switch back to master and delete (-D) the other branch. it should now be unreachable from within git (to prove its existence, i remember a commit SHA1 on the dead branch). then i try to get rid of the unreachable objects with a series of prune, fsck and gc. ------------- schoen.d@ax:~/projects/gitFeatures$ cat mk_dead_end.sh #!/bin/sh DEAD=dead_end rm -rf $DEAD mkdir $DEAD cd $DEAD git init echo "first commit" > A git add A git commit -m "first commit" git tag first_commit echo "second commit" >> A git add A git commit -m "second commit" git checkout first_commit echo "commit in dead end" >> A git add A git commit -m "changed A in dead end" git checkout -b $DEAD dead_commit=`git log -1 --format="%H"` git checkout master git branch -D $DEAD git show $dead_commit git fsck --unreachable --full --verbose git fsck --unreachable HEAD \ $(git for-each-ref --format="%(objectname)" refs/heads) git fsck --lost-found git prune -v $dead_commit git prune $(git rev-parse --all) git repack git prune-packed git gc --prune=now git gc --aggressive git show $dead_commit ------ if you look at the output of this script then you see that git knows that there are unreachable/dangling objects, but they remain. thankful for any pointer, dieter ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: how to remove unreachable objects? 2011-09-19 9:08 how to remove unreachable objects? dieter @ 2011-09-19 19:53 ` Jeff King 2011-09-19 20:18 ` Jeff King 2011-09-24 22:11 ` Dieter Schön 2011-09-19 20:36 ` Andreas Schwab 1 sibling, 2 replies; 9+ messages in thread From: Jeff King @ 2011-09-19 19:53 UTC (permalink / raw) To: dieter; +Cc: git On Mon, Sep 19, 2011 at 11:08:31AM +0200, dieter@schoen.or.at wrote: > this is my use case: > i create a repository and produce several commits on master. > then i go back to a certain tag and create a new branch, where i also > commit. > then i switch back to master and delete (-D) the other branch. > it should now be unreachable from within git (to prove its existence, > i remember a commit SHA1 on the dead branch). It will still be referenced by the HEAD reflog, won't it? > git checkout master > git branch -D $DEAD > git show $dead_commit > git fsck --unreachable --full --verbose This shows it reachable, because it is connected from the HEAD reflog. > git fsck --unreachable HEAD \ > $(git for-each-ref --format="%(objectname)" refs/heads) And this shows it as unreachable, because you are asking git to only look at the branch tips and HEAD (by default, it looks at all refs and reflogs). I suspect you copied this straight from the git-fsck manpage. That advice is a bit outdated, I think. It blames (in some form) all the way back to the original documentation added in c64b9b8 (2005-05-05, only a few weeks after git was born). A few weeks later, fsck learned to default to looking at all refs (1024932, 2005-05-18). And then other sane defaults like reflogs got tacked on later (reflogs came around the 1.4.x era, in 2006). > git fsck --lost-found > git prune -v $dead_commit > git prune $(git rev-parse --all) > git repack > git prune-packed > git gc --prune=now > git gc --aggressive > git show $dead_commit If you really want to make it unreachable, you should expire the reflogs, too: git reflog expire --expire=now --all # will now report unreachable git fsck --unreachable # will now actually delete objects git prune -v # gives "bad object ..." git show $dead_commit git-gc will do this for you, but of course the default expiration time is much longer (I think something like 90 days). -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: how to remove unreachable objects? 2011-09-19 19:53 ` Jeff King @ 2011-09-19 20:18 ` Jeff King 2011-09-19 21:38 ` Junio C Hamano 2011-09-24 22:11 ` Dieter Schön 1 sibling, 1 reply; 9+ messages in thread From: Jeff King @ 2011-09-19 20:18 UTC (permalink / raw) To: dieter; +Cc: Junio C Hamano, git On Mon, Sep 19, 2011 at 03:53:36PM -0400, Jeff King wrote: > > git fsck --unreachable HEAD \ > > $(git for-each-ref --format="%(objectname)" refs/heads) > > And this shows it as unreachable, because you are asking git to only > look at the branch tips and HEAD (by default, it looks at all refs and > reflogs). > > I suspect you copied this straight from the git-fsck manpage. That > advice is a bit outdated, I think. It blames (in some form) all the way > back to the original documentation added in c64b9b8 (2005-05-05, only a > few weeks after git was born). A few weeks later, fsck learned to > default to looking at all refs (1024932, 2005-05-18). And then other > sane defaults like reflogs got tacked on later (reflogs came around the > 1.4.x era, in 2006). So we should probably do something like this: -- >8 -- Subject: [PATCH] docs: brush up obsolete bits of git-fsck manpage After the description and options, the fsck manpage contains some discussion about what it does. Over time, this discussion has become somewhat obsolete, both in content and formatting. In particular: 1. There are many options now, so starting the discussion with "It tests..." makes it unclear whether we are talking about the last option, or about the tool in general. Let's start a new "discussion" section and make our antecedent more clear. 2. It gave an example for --unreachable using for-each-ref to mention all of the heads, saying that it will do "a _lot_ of verification". This is hopelessly out-of-date, as giving no arguments will check much more (reflogs, the index, non-head refs). 3. It goes on to mention tests "to be added" (like tree object sorting). We now have these tests. Signed-off-by: Jeff King <peff@peff.net> --- I was tempted to just drop this section entirely. It's mostly redundant with the DESCRIPTION section, and any extra details could be folded in there. The most useful bit is the "what do you do when there is corruption". But that should perhaps get its own section, if somebody feels like writing something more detailed (I thought we had a guide somewhere, but I couldn't find it). Documentation/git-fsck.txt | 26 ++++++++------------------ 1 files changed, 8 insertions(+), 18 deletions(-) diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt index a2a508d..55b33d7 100644 --- a/Documentation/git-fsck.txt +++ b/Documentation/git-fsck.txt @@ -72,30 +72,20 @@ index file, all SHA1 references in .git/refs/*, and all reflogs (unless a blob, the contents are written into the file, rather than its object name. -It tests SHA1 and general object sanity, and it does full tracking of -the resulting reachability and everything else. It prints out any -corruption it finds (missing or bad objects), and if you use the -'--unreachable' flag it will also print out objects that exist but -that aren't reachable from any of the specified head nodes. - -So for example - - git fsck --unreachable HEAD \ - $(git for-each-ref --format="%(objectname)" refs/heads) +DISCUSSION +---------- -will do quite a _lot_ of verification on the tree. There are a few -extra validity tests to be added (make sure that tree objects are -sorted properly etc), but on the whole if 'git fsck' is happy, you -do have a valid tree. +git-fsck tests SHA1 and general object sanity, and it does full tracking +of the resulting reachability and everything else. It prints out any +corruption it finds (missing or bad objects), and if you use the +'--unreachable' flag it will also print out objects that exist but that +aren't reachable from any of the specified head nodes (or the default +set, as mentioned above). Any corrupt objects you will have to find in backups or other archives (i.e., you can just remove them and do an 'rsync' with some other site in the hopes that somebody else has the object you have corrupted). -Of course, "valid tree" doesn't mean that it wasn't generated by some -evil person, and the end result might be crap. git is a revision -tracking system, not a quality assurance system ;) - Extracted Diagnostics --------------------- -- 1.7.7.rc1.3.gb95be ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: how to remove unreachable objects? 2011-09-19 20:18 ` Jeff King @ 2011-09-19 21:38 ` Junio C Hamano 2011-09-19 22:52 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2011-09-19 21:38 UTC (permalink / raw) To: Jeff King; +Cc: dieter, git Jeff King <peff@peff.net> writes: > I was tempted to just drop this section entirely. It's mostly redundant > with the DESCRIPTION section, and any extra details could be folded in > there. The most useful bit is the "what do you do when there is > corruption". But that should perhaps get its own section, if somebody > feels like writing something more detailed (I thought we had a guide > somewhere, but I couldn't find it). Yeah, I've been thinking about making it an error to give refs to fsck, as I do not think the use cases for feature justifies the possible confusion it may cause. One possible use case might be when your repository is corrupt, and does not pass "git fsck" (without any argument). In such a case, if you are lucky and your disk corrupted objects only reachable from a recent topic branch, you might find that this command: $ git fsck master next ...list other topics here... still succeeds, so that you can figure out which topic makes such a limited fsck fail when it is listed on the command line, judge its importance and resurrect what you can from there, before nuking it to bring the repository back in health so that you can recreate the topic. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: how to remove unreachable objects? 2011-09-19 21:38 ` Junio C Hamano @ 2011-09-19 22:52 ` Jeff King 2011-09-20 0:40 ` Junio C Hamano 0 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2011-09-19 22:52 UTC (permalink / raw) To: Junio C Hamano; +Cc: dieter, git On Mon, Sep 19, 2011 at 02:38:41PM -0700, Junio C Hamano wrote: > Yeah, I've been thinking about making it an error to give refs to fsck, as > I do not think the use cases for feature justifies the possible confusion > it may cause. > > One possible use case might be when your repository is corrupt, and does > not pass "git fsck" (without any argument). In such a case, if you are > lucky and your disk corrupted objects only reachable from a recent topic > branch, you might find that this command: > > $ git fsck master next ...list other topics here... > > still succeeds, so that you can figure out which topic makes such a > limited fsck fail when it is listed on the command line, judge its > importance and resurrect what you can from there, before nuking it to > bring the repository back in health so that you can recreate the topic. Does that work? I had the impression from the documentation that the arguments are purely about the reachability analysis, and that the actual corruption/correctness checks actually look through the object db directly, making sure each object is well-formed. Skimming cmd_fsck seems to confirm that. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: how to remove unreachable objects? 2011-09-19 22:52 ` Jeff King @ 2011-09-20 0:40 ` Junio C Hamano 2011-09-20 0:51 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2011-09-20 0:40 UTC (permalink / raw) To: Jeff King; +Cc: dieter, git Jeff King <peff@peff.net> writes: >> One possible use case might be when your repository is corrupt, and does >> not pass "git fsck" (without any argument). In such a case, if you are >> lucky and your disk corrupted objects only reachable from a recent topic >> branch, you might find that this command: >> >> $ git fsck master next ...list other topics here... >> >> still succeeds, so that you can figure out which topic makes such a >> limited fsck fail when it is listed on the command line, judge its >> importance and resurrect what you can from there, before nuking it to >> bring the repository back in health so that you can recreate the topic. > > Does that work? I had the impression from the documentation that the > arguments are purely about the reachability analysis, and that the > actual corruption/correctness checks actually look through the object db > directly, making sure each object is well-formed. Skimming cmd_fsck > seems to confirm that. You are right that you may see "corrupt object" for unreachable from the tips in the object store, but I was talking more about verifying everything that is needed for reachability analysis from the given tips can be read, iow, "missing object" errors, lack of which would mean you can salvage everything reachable from the refs involved in the traversal. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: how to remove unreachable objects? 2011-09-20 0:40 ` Junio C Hamano @ 2011-09-20 0:51 ` Jeff King 0 siblings, 0 replies; 9+ messages in thread From: Jeff King @ 2011-09-20 0:51 UTC (permalink / raw) To: Junio C Hamano; +Cc: dieter, git On Mon, Sep 19, 2011 at 05:40:03PM -0700, Junio C Hamano wrote: > > Does that work? I had the impression from the documentation that the > > arguments are purely about the reachability analysis, and that the > > actual corruption/correctness checks actually look through the object db > > directly, making sure each object is well-formed. Skimming cmd_fsck > > seems to confirm that. > > You are right that you may see "corrupt object" for unreachable from the > tips in the object store, but I was talking more about verifying > everything that is needed for reachability analysis from the given tips > can be read, iow, "missing object" errors, lack of which would mean you > can salvage everything reachable from the refs involved in the traversal. True. Though one could also do that with "git log", and it would be much cheaper (since each trial you run with git-fsck is going to actually fsck the object db, which is expensive). I can't help but think the right solution there is something like: 1. If the corrupted or missing object is a blob or tree, figure out which commits reference it with something like: a. Create a set B of bad objects (blobs or trees). b. For each tree in the object db, open and see if it contains any elements of B. If so, add the tree to another set, B'. c. If B' is empty, done. Otherwise, add elements from B' to B and goto step (b). d. For each commit in the object db, open and check the tree pointer. If it points to an element of B, then the commit is bad. 2. If the object is a commit, or if you arrived at a set of bad commits through step (1), then use "branch --contains" on the bad commits. which is algorithmically efficient (though probably slow if you had to cat-file each tree). It might be a handy special command, though (I have seen people ask for "which part of history references this blob" on occasion). I've never bothered writing it because I've never had a corrupt object. :) Anyway, that is perhaps not relevant to your point. But I do think that fsck with arguments is more likely to confuse someone than to actually be part of a productive use-case. I have no problem with deprecating or removing it. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: how to remove unreachable objects? 2011-09-19 19:53 ` Jeff King 2011-09-19 20:18 ` Jeff King @ 2011-09-24 22:11 ` Dieter Schön 1 sibling, 0 replies; 9+ messages in thread From: Dieter Schön @ 2011-09-24 22:11 UTC (permalink / raw) To: git Am 19.09.2011 um 21:53 schrieb Jeff King: > On Mon, Sep 19, 2011 at 11:08:31AM +0200, dieter@schoen.or.at wrote: > >> this is my use case: >> i create a repository and produce several commits on master. >> then i go back to a certain tag and create a new branch, where i also >> commit. >> then i switch back to master and delete (-D) the other branch. >> it should now be unreachable from within git (to prove its existence, >> i remember a commit SHA1 on the dead branch). > > It will still be referenced by the HEAD reflog, won't it? thanks to all that answered! it was very helpful and i gained a bit more insight. kind regards, dieter ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: how to remove unreachable objects? 2011-09-19 9:08 how to remove unreachable objects? dieter 2011-09-19 19:53 ` Jeff King @ 2011-09-19 20:36 ` Andreas Schwab 1 sibling, 0 replies; 9+ messages in thread From: Andreas Schwab @ 2011-09-19 20:36 UTC (permalink / raw) To: dieter; +Cc: git dieter@schoen.or.at writes: > if you look at the output of this script then you see that git knows > that there > are unreachable/dangling objects, but they remain. You also need to prune the reflog of HEAD. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-09-24 22:24 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-09-19 9:08 how to remove unreachable objects? dieter 2011-09-19 19:53 ` Jeff King 2011-09-19 20:18 ` Jeff King 2011-09-19 21:38 ` Junio C Hamano 2011-09-19 22:52 ` Jeff King 2011-09-20 0:40 ` Junio C Hamano 2011-09-20 0:51 ` Jeff King 2011-09-24 22:11 ` Dieter Schön 2011-09-19 20:36 ` Andreas Schwab
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).