* should git download missing objects? @ 2006-11-12 15:44 Anand Kumria 2006-11-12 19:41 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Anand Kumria @ 2006-11-12 15:44 UTC (permalink / raw) To: git Hi, I did an initial clone of Linus' linux-2.6.git tree, via the git protocol, and then managed to accidently delete one of the .pack and corresponding .idx files. I thought that 'cg-fetch' would do the job of bring down the missing pack again, and all would be well. Alas this isn't the case. <http://pastebin.ca/246678> Pasky, on IRC, indicated that this might be because git-fetch-pack isn't downloading missing objects when the git:// protocol is being used. Should it? Is there a magic invocation of git fetch I can use to fix this up. I can always re-clone completely (since this is just a tracking repo) but it would be nice to fix this with the tools themselves. Any hints? Thanks, Anand ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: should git download missing objects? 2006-11-12 15:44 should git download missing objects? Anand Kumria @ 2006-11-12 19:41 ` Junio C Hamano 2006-11-13 19:45 ` Alex Riesen 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2006-11-12 19:41 UTC (permalink / raw) To: Anand Kumria; +Cc: git "Anand Kumria" <wildfire@progsoc.org> writes: > I did an initial clone of Linus' linux-2.6.git tree, via the git protocol, > and then managed to accidently delete one of the .pack and > corresponding .idx files. > > I thought that 'cg-fetch' would do the job of bring down the missing pack > again, and all would be well. Alas this isn't the case. > > <http://pastebin.ca/246678> > > Pasky, on IRC, indicated that this might be because git-fetch-pack isn't > downloading missing objects when the git:// protocol is being used. There are the invariants between refs and objects: - objects that its refs (files under .git/refs/ hierarchy that record 40-byte hexadecimal object names) point at are never missing, or the repository is corrupt. - objects that are reachable via pointers in another object that is not missing (a tag points at another object, a commit points at its tree and its parent commits, and a tree points at its subtrees and blobs) are never missing, or the repository is corrupt. Git tools first fetch missing objects and then update your refs only when fetch succeeds completely, in order to maintain the above invariants (a partial fetch does not update your refs). And these invariants are why: - fsck-objects start reachability check from the refs; - commit walkers can stop at your existing refs; - git native protocols only need to tell the other end what refs you have, in order for the other end to exclude what you already have from the set of objects it sends you. What's missing needs to be determined in a reasonably efficient manner, and the above invariants allow us not have to do the equivalent of fsck-objects every time. Being able to trust refs is fairly fundamental in the fetch operation of git. I am not opposed to the idea of a new tool to fix a corrupted repository that has broken the above invariants, perhaps caused by accidental removal of objects and packs by end users. What it needs to do would be: - run fsck-objects to notice what are missing, by noting "broken link from foo to bar" output messages. Object 'bar' is what you _ought_ to have according to your refs but you don't (because you removed the objects that should be there), and everything that is reachable from it from the other side needs to be retrieved. Because you do not have 'bar', your end cannot determine what other objects you happen to have in your object store are reachable from it and would result in redundant download. - run fetch-pack equivalent to get everything reachable starting at the above missing objects, pretending you do not have any object, because your refs are not trustworthy. - run fsck-objects again to make sure that your refs can now be trusted again. To implement the second step above, you need to implement a modified fetch-pack that does not trust any of your refs. It also needs to ignore what are offered from the other end but asks the objects you know are missing ('bar' in the above example). This program needs to talk to a modified upload-pack running at the other end (let's call it upload-pack-recover), because usual upload-pack does not serve starting from a random object that happen to be in its repository, but only starting from objects that are pointed by its own set of refs to ensure integrity. The upload-pack-recover program would need to start traversal from object 'bar' in the above example, and when it does so, it should not just run 'rev-list --objects' starting at 'bar'. It first needs to prove that its object store has everything that is reachable from 'bar' (the recipient would still end up with an incomplete repository if it didn't). What this means is that it needs to prove some of its refs can reach 'bar' (again, on the upstream end, only refs are trusted, not mere existence of object is not enough) before sending objects back. Usual upload-pack do not have to do it because it refuses to serve starting from anything but what its refs point at (and by the invariants, the objects pointed at by refs are guaranteed to be complete [an object is "complete" if no object that can be reachable is not missing]). This is needed because the repository might have discarded branch that used to reach 'bar', and while the object 'bar' was in a pack but some of its ancestors or component trees and/or blobs were loose and subsequent git-prune have removed the latter without removing 'bar'. Mere existence of the object 'bar' does not mean 'bar' is complete. So coming up with such a pair of programs is not a rocket science, but it is fairly delicate. I would rather have them as specialized commands, not a part of everyday commands, even if you were to implement it. Since this is not everyday anyway, a far easier way would be to clone-pack from the upstream into a new repository, take the pack you downloaded from that new repository and mv it into your corrupt repository. You can run fsck-objects to see if you got back everything you lost earlier. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: should git download missing objects? 2006-11-12 19:41 ` Junio C Hamano @ 2006-11-13 19:45 ` Alex Riesen 2006-11-13 19:54 ` Shawn Pearce 2006-11-13 20:05 ` Junio C Hamano 0 siblings, 2 replies; 10+ messages in thread From: Alex Riesen @ 2006-11-13 19:45 UTC (permalink / raw) To: Junio C Hamano; +Cc: Anand Kumria, git Junio C Hamano, Sun, Nov 12, 2006 20:41:23 +0100: > Since this is not everyday anyway, a far easier way would be to > clone-pack from the upstream into a new repository, take the > pack you downloaded from that new repository and mv it into your > corrupt repository. You can run fsck-objects to see if you got > back everything you lost earlier. I get into such a situation annoyingly often, by using "git clone -l -s from to" and doing some "cleanup" in the origin repository. For example, it happens that I remove a tag, or a branch, and do a repack or prune afterwards. The related repositories, which had "accidentally" referenced the pruned objects become "corrupt", as you put it. At the moment, if I run into the situation, I copy packs/objects from all repos I have (objects/info/alternates are useful here too), run a fsck-objects/repack and hope nothing is lost. It works, as I almost always have "accidental" backups somewhere, but is kind of annoying to setup. A tool to do this job more effectively will be very handy (at least, it wont have to copy gigabytes of data over switched windows network. Not often, I hope. Not _so_ many gigabytes, possibly). ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: should git download missing objects? 2006-11-13 19:45 ` Alex Riesen @ 2006-11-13 19:54 ` Shawn Pearce 2006-11-13 20:03 ` Petr Baudis 2006-11-13 20:05 ` Junio C Hamano 1 sibling, 1 reply; 10+ messages in thread From: Shawn Pearce @ 2006-11-13 19:54 UTC (permalink / raw) To: Alex Riesen; +Cc: Junio C Hamano, Anand Kumria, git Alex Riesen <fork0@t-online.de> wrote: > Junio C Hamano, Sun, Nov 12, 2006 20:41:23 +0100: > > Since this is not everyday anyway, a far easier way would be to > > clone-pack from the upstream into a new repository, take the > > pack you downloaded from that new repository and mv it into your > > corrupt repository. You can run fsck-objects to see if you got > > back everything you lost earlier. > > I get into such a situation annoyingly often, by using > "git clone -l -s from to" and doing some "cleanup" in the > origin repository. For example, it happens that I remove a tag, > or a branch, and do a repack or prune afterwards. The related > repositories, which had "accidentally" referenced the pruned > objects become "corrupt", as you put it. > > At the moment, if I run into the situation, I copy packs/objects from > all repos I have (objects/info/alternates are useful here too), run a > fsck-objects/repack and hope nothing is lost. It works, as I almost > always have "accidental" backups somewhere, but is kind of annoying to > setup. A tool to do this job more effectively will be very handy (at > least, it wont have to copy gigabytes of data over switched windows > network. Not often, I hope. Not _so_ many gigabytes, possibly). One of my coworkers recently lost a single loose tree object. We suspect his Windows virus scanner deleted the file. :-( Copying the one bad object from another repository immediately fixed the breakage caused, but it was very annoying to not be able to run a "git fetch --missing-objects" or some such. Fortunately it was just the one object and it was also still loose in another repository. scp was handy. :-) -- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: should git download missing objects? 2006-11-13 19:54 ` Shawn Pearce @ 2006-11-13 20:03 ` Petr Baudis 2006-11-13 20:10 ` Shawn Pearce 2006-11-13 20:22 ` Junio C Hamano 0 siblings, 2 replies; 10+ messages in thread From: Petr Baudis @ 2006-11-13 20:03 UTC (permalink / raw) To: Shawn Pearce; +Cc: Alex Riesen, Junio C Hamano, Anand Kumria, git On Mon, Nov 13, 2006 at 08:54:14PM CET, Shawn Pearce wrote: > Alex Riesen <fork0@t-online.de> wrote: > > Junio C Hamano, Sun, Nov 12, 2006 20:41:23 +0100: > > > Since this is not everyday anyway, a far easier way would be to > > > clone-pack from the upstream into a new repository, take the > > > pack you downloaded from that new repository and mv it into your > > > corrupt repository. You can run fsck-objects to see if you got > > > back everything you lost earlier. > > > > I get into such a situation annoyingly often, by using > > "git clone -l -s from to" and doing some "cleanup" in the > > origin repository. For example, it happens that I remove a tag, > > or a branch, and do a repack or prune afterwards. The related > > repositories, which had "accidentally" referenced the pruned > > objects become "corrupt", as you put it. > > > > At the moment, if I run into the situation, I copy packs/objects from > > all repos I have (objects/info/alternates are useful here too), run a > > fsck-objects/repack and hope nothing is lost. It works, as I almost > > always have "accidental" backups somewhere, but is kind of annoying to > > setup. A tool to do this job more effectively will be very handy (at > > least, it wont have to copy gigabytes of data over switched windows > > network. Not often, I hope. Not _so_ many gigabytes, possibly). cg-fetch -f locally or over HTTP should be able to fix that up, if used cleverly. > One of my coworkers recently lost a single loose tree object. > We suspect his Windows virus scanner deleted the file. :-( > > Copying the one bad object from another repository immediately fixed > the breakage caused, but it was very annoying to not be able to run a > "git fetch --missing-objects" or some such. Fortunately it was just > the one object and it was also still loose in another repository. > scp was handy. :-) If it's over ssh, this is still where the heavily dusty (and heavily "plumby") git-ssh-fetch command is useful, since it can get passed an undocumented --recover argument and then it will fetch _all_ the objects you are missing, not assuming anything. Perhaps I should reintroduce support for git-ssh-fetch to cg-fetch to be used in case of -f over SSH. But it would be silly if I did that and next Git would remove the command from its suite. Junio, what's its life expectancy? I guess this usage scenario is something to take into account when thinking about removing it, I know that I wanted to get rid of it in the past but now my opinion is changing. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: should git download missing objects? 2006-11-13 20:03 ` Petr Baudis @ 2006-11-13 20:10 ` Shawn Pearce 2006-11-13 20:22 ` Junio C Hamano 1 sibling, 0 replies; 10+ messages in thread From: Shawn Pearce @ 2006-11-13 20:10 UTC (permalink / raw) To: Petr Baudis; +Cc: Alex Riesen, Junio C Hamano, Anand Kumria, git Petr Baudis <pasky@suse.cz> wrote: > On Mon, Nov 13, 2006 at 08:54:14PM CET, Shawn Pearce wrote: > > Copying the one bad object from another repository immediately fixed > > the breakage caused, but it was very annoying to not be able to run a > > "git fetch --missing-objects" or some such. Fortunately it was just > > the one object and it was also still loose in another repository. > > scp was handy. :-) > > If it's over ssh, this is still where the heavily dusty (and heavily > "plumby") git-ssh-fetch command is useful, since it can get passed an > undocumented --recover argument and then it will fetch _all_ the objects > you are missing, not assuming anything. Interesting. Since its undocumented I didn't know it existed until now. :) I'm thinking though that a --recover should just be part of git-fetch, and that it should work on all transports, not just SSH. Of course you could get into a whole world of hurt where you keep doing fsck-objects --full (listing out the missing), fetching them, only to find more missing, etc. After a coule of cycles of that it may just be better to claim to the other end that you have nothing but want everything (e.g. an initial clone) and get a new pack from which you can pull objects. But I think that was sort of Junio's point on this topic. I'm just trying to throw in my +1 in favor of a feature that would have recovered that sole missing object without making the end user reclone their entire repository and move pack files around by hand. And I'm being more verbose about it than just +1. :) -- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: should git download missing objects? 2006-11-13 20:03 ` Petr Baudis 2006-11-13 20:10 ` Shawn Pearce @ 2006-11-13 20:22 ` Junio C Hamano 2006-11-14 20:08 ` Petr Baudis 1 sibling, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2006-11-13 20:22 UTC (permalink / raw) To: Petr Baudis; +Cc: git Petr Baudis <pasky@suse.cz> writes: > ... Junio, what's its life > expectancy? I guess this usage scenario is something to take into > account when thinking about removing it, I know that I wanted to get rid > of it in the past but now my opinion is changing. It uses the same commit walker semantics and mechanism so I do not think it is too much burden to carry it, but I'd rather have something that works over git native protocol if we really care about this. People without ssh access needs to be able to recover over git:// protocol. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: should git download missing objects? 2006-11-13 20:22 ` Junio C Hamano @ 2006-11-14 20:08 ` Petr Baudis 0 siblings, 0 replies; 10+ messages in thread From: Petr Baudis @ 2006-11-14 20:08 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Mon, Nov 13, 2006 at 09:22:13PM CET, Junio C Hamano wrote: > Petr Baudis <pasky@suse.cz> writes: > > > ... Junio, what's its life > > expectancy? I guess this usage scenario is something to take into > > account when thinking about removing it, I know that I wanted to get rid > > of it in the past but now my opinion is changing. > > It uses the same commit walker semantics and mechanism so I do > not think it is too much burden to carry it, but I'd rather have > something that works over git native protocol if we really care > about this. People without ssh access needs to be able to > recover over git:// protocol. Even though I obviously agree with the above, it would be useful to have the flag even though git:// (which is apparently harder to get right than the others) is not supported. After all, most repositories I've seen that are available over git:// are available over HTTP as well. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: should git download missing objects? 2006-11-13 19:45 ` Alex Riesen 2006-11-13 19:54 ` Shawn Pearce @ 2006-11-13 20:05 ` Junio C Hamano 2006-11-13 22:52 ` Alex Riesen 1 sibling, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2006-11-13 20:05 UTC (permalink / raw) To: Alex Riesen; +Cc: git fork0@t-online.de (Alex Riesen) writes: > Junio C Hamano, Sun, Nov 12, 2006 20:41:23 +0100: >> Since this is not everyday anyway, a far easier way would be to >> clone-pack from the upstream into a new repository, take the >> pack you downloaded from that new repository and mv it into your >> corrupt repository. You can run fsck-objects to see if you got >> back everything you lost earlier. > > I get into such a situation annoyingly often, by using > "git clone -l -s from to" and doing some "cleanup" in the > origin repository. For example, it happens that I remove a tag, > or a branch, and do a repack or prune afterwards. The related > repositories, which had "accidentally" referenced the pruned > objects become "corrupt", as you put it. > > At the moment, if I run into the situation, I copy packs/objects from > all repos I have (objects/info/alternates are useful here too), run a > fsck-objects/repack and hope nothing is lost. It works, as I almost > always have "accidental" backups somewhere, but is kind of annoying to > setup. A tool to do this job more effectively will be very handy (at > least, it wont have to copy gigabytes of data over switched windows > network. Not often, I hope. Not _so_ many gigabytes, possibly). I suspect it is a different issue. Maybe you would need reverse links from the origin directory to .git/refs/ directroy of repositories that borrow from it to prevent pruning. No amount of butchering fetch-pack to look behind incomplete refs that lie and claim they are complete would solve your problem if you do not have any "accidental backups". In general, 'git clone -l -s' origin directories may not be writable by the person who is making the clone, so we should not do this inside 'git clone'. Also you could add alternates after you set up your repository, so maybe something like this would help? #!/bin/sh # # Usage: git-add-alternates other_repo # : ${GIT_DIR=.git} my_refs=`cd $GIT_DIR/refs && pwd` other=$1 test -d "$other/.git" && test -d "$other/objects" || { echo >&2 "I do not see a repository at $other" exit 1 } mkdir -p "$other/.git/refs/borrowers" || { echo >&2 "You cannot write in $other" echo >&2 "Arrange with the owner of it to make" echo >&2 "sure the objects you need are not pruned." exit 2 } cnt=0 while test -d "$other/.git/refs/borrowers/$cnt" do cnt=$(($cnt + 1)) done ln -s "$my_refs" "$other/.git/refs/borrowers/$cnt" ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: should git download missing objects? 2006-11-13 20:05 ` Junio C Hamano @ 2006-11-13 22:52 ` Alex Riesen 0 siblings, 0 replies; 10+ messages in thread From: Alex Riesen @ 2006-11-13 22:52 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Junio C Hamano, Mon, Nov 13, 2006 21:05:48 +0100: > > Junio C Hamano, Sun, Nov 12, 2006 20:41:23 +0100: > >> Since this is not everyday anyway, a far easier way would be to > >> clone-pack from the upstream into a new repository, take the > >> pack you downloaded from that new repository and mv it into your > >> corrupt repository. You can run fsck-objects to see if you got > >> back everything you lost earlier. > > > > I get into such a situation annoyingly often, by using > > "git clone -l -s from to" and doing some "cleanup" in the > > origin repository. For example, it happens that I remove a tag, > > or a branch, and do a repack or prune afterwards. The related > > repositories, which had "accidentally" referenced the pruned > > objects become "corrupt", as you put it. > > > > At the moment, if I run into the situation, I copy packs/objects from > > all repos I have (objects/info/alternates are useful here too), run a > > fsck-objects/repack and hope nothing is lost. It works, as I almost > > always have "accidental" backups somewhere, but is kind of annoying to > > setup. A tool to do this job more effectively will be very handy (at > > least, it wont have to copy gigabytes of data over switched windows > > network. Not often, I hope. Not _so_ many gigabytes, possibly). > > I suspect it is a different issue. Maybe you would need reverse > links from the origin directory to .git/refs/ directroy of > repositories that borrow from it to prevent pruning. No amount > of butchering fetch-pack to look behind incomplete refs that lie > and claim they are complete would solve your problem if you do > not have any "accidental backups". It's is not about preventing this from happening. It is about recovering from user error (which I plainly did). The discussion about "git fetch --recover" sound very much like what would helped in that situation. I'll just try not doing it next time, but if I do, it'd be nice to have a tool to help me recover from it. Not prevent, not seeing it possible, just help. Anyway, it's kind of too late for that repositories. And not very convenient to work with: the branches in the slave repos come and go often, they pull from each other and push into central (aka origin) repo. Maintain the borrowed refs in sync would be nightmare (as is: "I promise to forget doing it"). ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-11-14 20:08 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-11-12 15:44 should git download missing objects? Anand Kumria 2006-11-12 19:41 ` Junio C Hamano 2006-11-13 19:45 ` Alex Riesen 2006-11-13 19:54 ` Shawn Pearce 2006-11-13 20:03 ` Petr Baudis 2006-11-13 20:10 ` Shawn Pearce 2006-11-13 20:22 ` Junio C Hamano 2006-11-14 20:08 ` Petr Baudis 2006-11-13 20:05 ` Junio C Hamano 2006-11-13 22:52 ` Alex Riesen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).