* Recovering from repository corruption @ 2008-06-10 17:26 Denis Bueno 2008-06-10 17:55 ` Jakub Narebski 2008-06-10 19:40 ` Recovering from repository corruption Nicolas Pitre 0 siblings, 2 replies; 31+ messages in thread From: Denis Bueno @ 2008-06-10 17:26 UTC (permalink / raw) To: Git Mailing List I started a thread a while back about repository corruption. It manifested as a clone error and the thread is here: http://kerneltrap.org/mailarchive/git/2007/7/31/253475 I just ran, again, into corruption after my laptop kernel-panic'd. (Ironically, at the moment I ran into the corruption I was trying to push my repo to a backup location.) Since that thread took place it seems a section about recovering from repo corruption was added to the manual --- but it assumes you can (or care to painstakingly) recreate each corrupted version. I made several changes to one file, home.html, and now have the following corruption: identity.corrupt[56] > git fsck --full error: 320bd6e82267b71dd2ca7043ea3f61dbbca16109: object corrupt or missing error: 4d0be2816d5eea5ae2b40990235e2225c1715927: object corrupt or missing missing blob 320bd6e82267b71dd2ca7043ea3f61dbbca16109 missing blob 4d0be2816d5eea5ae2b40990235e2225c1715927 I know which commits these hashes correspond to, and I know roughly what I did in those commits, but, I really don't care that much, and anyway it will be painful to recreate them because of whitespace/formatting issues. Here are the commits, in case it is relevant: commit 163a93df14d246dee91c3a503e6372b8313f337d Author: Denis Bueno <dbueno@gmail.com> Date: Tue Jun 10 09:45:41 2008 -0400 Add lambda-the-ultimate link :100644 100644 320bd6e... 2ab4775... M home.html [... intervenent commits ...] commit 4737fea59fdc8325e09b5206cc7a6ac593446ce3 Author: Denis Bueno <dbueno@gmail.com> Date: Tue Jun 10 09:37:12 2008 -0400 Hoogle up top too :100644 100644 4d0be28... c6fe111... M home.html Assuming I can't recreate the hashed files, what are my options? I was told in the thread above that I could use grafts and "git filter-branch" to create a new repository that simply got rid of the offending object. That case was simpler, as it was the initial import of a file that had only two commits total that was corrupted. However, in this case there are changes between the initial and latest version of the file, and commits between the corrupted versions, so, I can imagine that it would be hard to get rid of in-between commits. The thing that makes sense intuitively (read: not as a Git expert, but as a user) is to just let me replace the commits associated with the problematic objects with new versions of those commits (e.g. make change described in the commit message, which is different from the actual change that was recorded, due to whitespace/formatting issues). Is this what I should do? And to do so, should I be reading chapter 5 of the manual? Thanks. -- Denis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 17:26 Recovering from repository corruption Denis Bueno @ 2008-06-10 17:55 ` Jakub Narebski 2008-06-10 19:38 ` Denis Bueno 2008-06-10 19:40 ` Recovering from repository corruption Nicolas Pitre 1 sibling, 1 reply; 31+ messages in thread From: Jakub Narebski @ 2008-06-10 17:55 UTC (permalink / raw) To: Denis Bueno; +Cc: Git Mailing List "Denis Bueno" <dbueno@gmail.com> writes: > I was told in the thread above that I could use grafts and "git > filter-branch" to create a new repository that simply got rid of the > offending object. That case was simpler, as it was the initial import > of a file that had only two commits total that was corrupted. > However, in this case there are changes between the initial and latest > version of the file, and commits between the corrupted versions, so, I > can imagine that it would be hard to get rid of in-between commits. > > The thing that makes sense intuitively (read: not as a Git expert, but > as a user) is to just let me replace the commits associated with the > problematic objects with new versions of those commits (e.g. make > change described in the commit message, which is different from the > actual change that was recorded, due to whitespace/formatting issues). > Is this what I should do? And to do so, should I be reading chapter > 5 of the manual? Without checking Git User's Manual, I think the solution could go as the following. Assume that history looks like this ...---.---a---*---b---.---... where by '*' is marked corruped commit (commit shich tree contains corrupted blobs). First, you can check the commit message for '*' using git-cat-file or git-show, you can get the difference between 'a' and 'b' using "git diff a b". When you know how repaired commit 'X' should look like, do something like: $ git checkout -b <temp-branch> 'a' $ <edit edit edit> $ git commit Then history would look like this ...---.---a---*---b---.---... \ \-X Now with grafts make 'b' be a child of 'X', i.e. modify parent of 'b' for history to look like below: ...---.---a---* b---.---... \ / \-X-/ Examine history using git-log, git-show, check tree with git-ls-tree and examining files, use graphical history browser like gitk. Then if possible use git-filter-branch to make history recorded in grafts file permanent... HTH -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 17:55 ` Jakub Narebski @ 2008-06-10 19:38 ` Denis Bueno 2008-06-10 19:59 ` Jakub Narebski 0 siblings, 1 reply; 31+ messages in thread From: Denis Bueno @ 2008-06-10 19:38 UTC (permalink / raw) To: Jakub Narebski; +Cc: Git Mailing List On Tue, Jun 10, 2008 at 13:55, Jakub Narebski <jnareb@gmail.com> wrote: > Assume that history looks like this > > ...---.---a---*---b---.---... > > where by '*' is marked corruped commit (commit shich tree contains > corrupted blobs). > > First, you can check the commit message for '*' using git-cat-file or > git-show, you can get the difference between 'a' and 'b' using > "git diff a b". When you know how repaired commit 'X' should look > like, do something like: > > $ git checkout -b <temp-branch> 'a' > $ <edit edit edit> > $ git commit > > Then history would look like this > > ...---.---a---*---b---.---... > \ > \-X > > Now with grafts make 'b' be a child of 'X', i.e. modify parent of 'b' > for history to look like below: > > ...---.---a---* b---.---... > \ / > \-X-/ > > Examine history using git-log, git-show, check tree with git-ls-tree > and examining files, use graphical history browser like gitk. > > Then if possible use git-filter-branch to make history recorded in > grafts file permanent... > > HTH > -- > Jakub Narebski > Poland > ShadeHawk on #git > Thanks for the help. My situation was: ...---a---*---b---c---d---*---e---... Following your example, I believe I got this to: ...---a---* b---c---d---* e---... \ / \ / \-X-/ \---/ That is, I replaced the first problematic commit and deleted the second, since I forgot how I changed 'd' to get that commit. I put the following in .git/info/grafts: 'b' X 'e' 'd' (which I gathered from here: http://thread.gmane.org/gmane.comp.version-control.git/66398/focus=66402. I've never use grafts before. A bit about them should be put in the manual, if it's not there already. =]) Then I ran: git-filter-branch HEAD ^X ^'d' Now "git log --raw --all" doesn't show any of the problematic SHA-1 hashes anymore! However: identity.fb[173] > git fsck --full error: 320bd6e82267b71dd2ca7043ea3f61dbbca16109: object corrupt or missing error: 4d0be2816d5eea5ae2b40990235e2225c1715927: object corrupt or missing missing blob 320bd6e82267b71dd2ca7043ea3f61dbbca16109 missing blob 4d0be2816d5eea5ae2b40990235e2225c1715927 Shouldn't these be unreferenced now that I've run filter-branch? -- Denis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 19:38 ` Denis Bueno @ 2008-06-10 19:59 ` Jakub Narebski 2008-06-10 20:03 ` Denis Bueno 0 siblings, 1 reply; 31+ messages in thread From: Jakub Narebski @ 2008-06-10 19:59 UTC (permalink / raw) To: Denis Bueno; +Cc: Git Mailing List On Tue, 10 Jun 2008, Denis Bueno wrote: > However: > > identity.fb[173] > git fsck --full > error: 320bd6e82267b71dd2ca7043ea3f61dbbca16109: object corrupt or missing > error: 4d0be2816d5eea5ae2b40990235e2225c1715927: object corrupt or missing > missing blob 320bd6e82267b71dd2ca7043ea3f61dbbca16109 > missing blob 4d0be2816d5eea5ae2b40990235e2225c1715927 > > Shouldn't these be unreferenced now that I've run filter-branch? Try to clone this repository (using file:/// pseudo-protocol to force transfer of objects instead of hardlinking them), and chek if the problem persists in the clone too. If not, error/missing might be in "garbage". But I'm not sure... -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 19:59 ` Jakub Narebski @ 2008-06-10 20:03 ` Denis Bueno 2008-06-10 20:14 ` Jakub Narebski 2008-06-10 20:23 ` Linus Torvalds 0 siblings, 2 replies; 31+ messages in thread From: Denis Bueno @ 2008-06-10 20:03 UTC (permalink / raw) To: Jakub Narebski; +Cc: Git Mailing List On Tue, Jun 10, 2008 at 15:59, Jakub Narebski <jnareb@gmail.com> wrote: >> Shouldn't these be unreferenced now that I've run filter-branch? > > Try to clone this repository (using file:/// pseudo-protocol to force > transfer of objects instead of hardlinking them), and chek if the > problem persists in the clone too. If not, error/missing might be > in "garbage". > > But I'm not sure... You're onto something: [dorothy.local /tmp <Tue Jun 10> <16:02:08>] tmp[176] > git clone file:///Volumes/work/identity.fb/ Initialized empty Git repository in /tmp/identity.fb/.git/ remote: Counting objects: 401, done. remote: Compressing objects: 100% (364/364), done. remote: Total 401 (delta 170), reused 0 (delta 0) Receiving objects: 100% (401/401), 233.76 KiB, done. Resolving deltas: 100% (170/170), done. [dorothy.local /tmp <Tue Jun 10> <16:02:22>] tmp[177] > cd identity.fb/ /tmp/identity.fb [dorothy.local /tmp/identity.fb <Tue Jun 10> <16:02:24>] identity.fb[178] > git fsck --full broken link from commit 4737fea59fdc8325e09b5206cc7a6ac593446ce3 to commit fe431b4b69453ad9207a5528cf9b9d12ef69c988 dangling commit 28aa69aafc8ae901e588f6d341b3e6d3558c6d26 dangling commit 884a8024fbcb9367726abb25f8bb6ac539712d46 missing commit fe431b4b69453ad9207a5528cf9b9d12ef69c988 But I've just substituted one error for another. Are these errors easier to fix? -- Denis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 20:03 ` Denis Bueno @ 2008-06-10 20:14 ` Jakub Narebski 2008-06-10 20:35 ` Denis Bueno 2008-06-10 20:23 ` Linus Torvalds 1 sibling, 1 reply; 31+ messages in thread From: Jakub Narebski @ 2008-06-10 20:14 UTC (permalink / raw) To: Denis Bueno; +Cc: Git Mailing List On Tue, 10 Jun 2008, Denis Bueno wrote: > On Tue, Jun 10, 2008, Jakub Narebski <jnareb@gmail.com> wrote: >> Denis Bueno wrote: >>> >>> Shouldn't these be unreferenced now that I've run filter-branch? >> >> Try to clone this repository (using file:/// pseudo-protocol to force >> transfer of objects instead of hardlinking them), and chek if the >> problem persists in the clone too. If not, error/missing might be >> in "garbage". >> >> But I'm not sure... > > You're onto something: > > [dorothy.local /tmp <Tue Jun 10> <16:02:08>] > tmp[176] > git clone file:///Volumes/work/identity.fb/ > Initialized empty Git repository in /tmp/identity.fb/.git/ > remote: Counting objects: 401, done. > remote: Compressing objects: 100% (364/364), done. > remote: Total 401 (delta 170), reused 0 (delta 0) > Receiving objects: 100% (401/401), 233.76 KiB, done. > Resolving deltas: 100% (170/170), done. > > [dorothy.local /tmp <Tue Jun 10> <16:02:22>] > tmp[177] > cd identity.fb/ > /tmp/identity.fb > > [dorothy.local /tmp/identity.fb <Tue Jun 10> <16:02:24>] > identity.fb[178] > git fsck --full > broken link from commit 4737fea59fdc8325e09b5206cc7a6ac593446ce3 > to commit fe431b4b69453ad9207a5528cf9b9d12ef69c988 > dangling commit 28aa69aafc8ae901e588f6d341b3e6d3558c6d26 > dangling commit 884a8024fbcb9367726abb25f8bb6ac539712d46 > missing commit fe431b4b69453ad9207a5528cf9b9d12ef69c988 > > But I've just substituted one error for another. Are these errors > easier to fix? Please remember that in such clone you _don't_ have grafts info (unless you copy it manually), so it is a good test if you correctly rewrote history using git-filter-branch. So take a look at history in your clone using gitk or some similar tool. In the history you mentioned: ...---a---* b---c---d---* e---... \ / \ / \-X-/ \---/ you should rewritr from 'a'=='X^' to, and including 'e' (and not only from 'd'). But if it is not the case I'm afraid I wouldn't be able to offer any further insight... -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 20:14 ` Jakub Narebski @ 2008-06-10 20:35 ` Denis Bueno 0 siblings, 0 replies; 31+ messages in thread From: Denis Bueno @ 2008-06-10 20:35 UTC (permalink / raw) To: Jakub Narebski; +Cc: Git Mailing List On Tue, Jun 10, 2008 at 16:14, Jakub Narebski <jnareb@gmail.com> wrote: > Please remember that in such clone you _don't_ have grafts info (unless > you copy it manually), so it is a good test if you correctly rewrote > history using git-filter-branch. So take a look at history in your > clone using gitk or some similar tool. > > In the history you mentioned: > > ...---a---* b---c---d---* e---... > \ / \ / > \-X-/ \---/ > > you should rewritr from 'a'=='X^' to, and including 'e' (and not only > from 'd'). So I re-did the filter-branch as: git-filter-branch HEAD 28aa69aafc8ae901e588f6d341b3e6d3558c6d26^..163a93df14d246dee91c3a503e6372b8313f337d Now cloning still works and only shows dangling commits --- no errors! -- Denis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 20:03 ` Denis Bueno 2008-06-10 20:14 ` Jakub Narebski @ 2008-06-10 20:23 ` Linus Torvalds 2008-06-10 20:28 ` Denis Bueno 1 sibling, 1 reply; 31+ messages in thread From: Linus Torvalds @ 2008-06-10 20:23 UTC (permalink / raw) To: Denis Bueno; +Cc: Jakub Narebski, Git Mailing List On Tue, 10 Jun 2008, Denis Bueno wrote: > > You're onto something: > > [dorothy.local /tmp <Tue Jun 10> <16:02:08>] > tmp[176] > git clone file:///Volumes/work/identity.fb/ [ successful ] Hmm. Scary. That should *not* have been successful with a corrupt repo. Unless you have done a .grafts file to hide the corruption, or something like that? Have you saved away the original corrupt repo (the whole .git directory as a tar-ball, for example)? And is the data public and non-embarrassing enough so that you could make it available for some post-corruption analysis? Even if we cannot help recover it, real-life corruption is always interesting to see if only as a test-case to make sure that git notices it as quickly as possible. Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 20:23 ` Linus Torvalds @ 2008-06-10 20:28 ` Denis Bueno 2008-06-10 21:09 ` Linus Torvalds 0 siblings, 1 reply; 31+ messages in thread From: Denis Bueno @ 2008-06-10 20:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Tue, Jun 10, 2008 at 16:23, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Tue, 10 Jun 2008, Denis Bueno wrote: >> >> You're onto something: >> >> [dorothy.local /tmp <Tue Jun 10> <16:02:08>] >> tmp[176] > git clone file:///Volumes/work/identity.fb/ > > [ successful ] > > Hmm. Scary. That should *not* have been successful with a corrupt repo. > > Unless you have done a .grafts file to hide the corruption, or something > like that? I intended to do that, yes, and I think I was successful. (I only say I "intended to" --- instead of "I did" --- because I read the documentation for the grafts file elsewhere on this list, and not in some more "blessed" location.) > Have you saved away the original corrupt repo (the whole .git directory as > a tar-ball, for example)? And is the data public and non-embarrassing > enough so that you could make it available for some post-corruption > analysis? Even if we cannot help recover it, real-life corruption is > always interesting to see if only as a test-case to make sure that git > notices it as quickly as possible. I do have bunches of personal information in the repo, unfortunately. The particular *file* involved in the corruption, however, is fine for all to view. Is that useful? -- Denis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 20:28 ` Denis Bueno @ 2008-06-10 21:09 ` Linus Torvalds 2008-06-10 21:22 ` Denis Bueno ` (3 more replies) 0 siblings, 4 replies; 31+ messages in thread From: Linus Torvalds @ 2008-06-10 21:09 UTC (permalink / raw) To: Denis Bueno; +Cc: Git Mailing List On Tue, 10 Jun 2008, Denis Bueno wrote: > > > > Hmm. Scary. That should *not* have been successful with a corrupt repo. > > > > Unless you have done a .grafts file to hide the corruption, or something > > like that? > > I intended to do that, yes, and I think I was successful. Ahh, ok. Yes, we should probably re-think our 'grafts' file thing, or at least not document it, because it's actually a wondeful way to just cause more corruption by hiding things (ie if you clone a repo with a grafts file, the result will now have neither the grafts file _nor_ the state that was hidden by it, so the result is guaranteed to be corrupt). But that explains why your clone worked, and why the resulting repo had different corruption - it avoided the original corruption, but because of the grafts file it avoided it by just not having those commits at all.. > I do have bunches of personal information in the repo, unfortunately. > The particular *file* involved in the corruption, however, is fine for > all to view. Is that useful? No, almost all the interest is basically in how the whole repo ties together. The individual corrupt files may be interesting, though, ie from your original report: error: 320bd6e82267b71dd2ca7043ea3f61dbbca16109: object corrupt or missing error: 4d0be2816d5eea5ae2b40990235e2225c1715927: object corrupt or missing then *if* you have the files .git/objects/32/0bd6e82267b71dd2ca7043ea3f61dbbca16109 .git/objects/4d/0be2816d5eea5ae2b40990235e2225c1715927 then those two files are interesting in themselves (most likely they are not there at all, or are zero-sized, but if you have them, please post them). And as this was a result of a real filesystem crash, it *is* possible that you have something in the /lost+found directory for that filesystem. If so, those missing files may be found there. Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 21:09 ` Linus Torvalds @ 2008-06-10 21:22 ` Denis Bueno 2008-06-10 21:48 ` Linus Torvalds 2008-06-10 21:27 ` Denis Bueno ` (2 subsequent siblings) 3 siblings, 1 reply; 31+ messages in thread From: Denis Bueno @ 2008-06-10 21:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List [-- Attachment #1: Type: text/plain, Size: 1054 bytes --] On Tue, Jun 10, 2008 at 17:09, Linus Torvalds <torvalds@linux-foundation.org> wrote: > No, almost all the interest is basically in how the whole repo ties > together. The individual corrupt files may be interesting, though, ie from > your original report: > > error: 320bd6e82267b71dd2ca7043ea3f61dbbca16109: object corrupt or missing > error: 4d0be2816d5eea5ae2b40990235e2225c1715927: object corrupt or missing > > then *if* you have the files > > .git/objects/32/0bd6e82267b71dd2ca7043ea3f61dbbca16109 > .git/objects/4d/0be2816d5eea5ae2b40990235e2225c1715927 > > then those two files are interesting in themselves (most likely they are > not there at all, or are zero-sized, but if you have them, please post > them). They are attached, and they are not zero-sized. > And as this was a result of a real filesystem crash, it *is* possible that > you have something in the /lost+found directory for that filesystem. If > so, those missing files may be found there. I checked; no such luck. -- Denis [-- Attachment #2: 0bd6e82267b71dd2ca7043ea3f61dbbca16109 --] [-- Type: application/octet-stream, Size: 2145 bytes --] [-- Attachment #3: 0be2816d5eea5ae2b40990235e2225c1715927 --] [-- Type: application/octet-stream, Size: 2110 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 21:22 ` Denis Bueno @ 2008-06-10 21:48 ` Linus Torvalds 2008-06-10 22:09 ` Denis Bueno 0 siblings, 1 reply; 31+ messages in thread From: Linus Torvalds @ 2008-06-10 21:48 UTC (permalink / raw) To: Denis Bueno; +Cc: Git Mailing List On Tue, 10 Jun 2008, Denis Bueno wrote: > > > > then *if* you have the files > > > > .git/objects/32/0bd6e82267b71dd2ca7043ea3f61dbbca16109 > > .git/objects/4d/0be2816d5eea5ae2b40990235e2225c1715927 > > > > then those two files are interesting in themselves (most likely they are > > not there at all, or are zero-sized, but if you have them, please post > > them). > > They are attached, and they are not zero-sized. Very interesting. Both of them look fairly sane as objects (ie random - it's supposed to eb zlib-compressed), but both of them have the first 512 bytes *identically* corrupted: 0000000 6564 626e 6575 406e 6f64 6f72 6874 2e79 d e n b u e n @ d o r o t h y . 0000020 6f6c 6163 2e6c 3634 0033 0000 0000 0000 l o c a l . 4 6 3 \0 \0 \0 \0 \0 \0 \0 0000040 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * ie it's an all-zero block, except for that email-looking thing at the head. Sadly, I don't think there is any way to find the missing block that got overwritten. And quite frankly, there's no way to really know whether the rest was really fine either - it just looks more likely, but quite frankly, it could have been random old contents on your disk too that just happens to look like the expected random pattern (which you'll get with any compression format - compression by definition removes patterns). One thign that strikes me is that you seem to be really prone to this problem, since it happened to you a year ago too. I cannot swear to this, but I literally suspect your last case (July-2007) was the previous time we had a corruption issue. Why does it seem to happen to you, but not others? Do you have some odd filesystem in play? Was the current corruption in a similar environment as the old one? IOW, I'm trying to find a pattern here, to see if there might be something we can do about it.. But it *sounds* like the objects you lost were literally old ones, no? Ie the lost stuff wasn't something you had committed in the last five minutes or so? If so, then you really do seem to have a filesystem that corrupts *old* files when it crashes. That's fairly scary. What FS is it? Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 21:48 ` Linus Torvalds @ 2008-06-10 22:09 ` Denis Bueno 2008-06-10 22:25 ` Tarmigan 2008-06-10 22:45 ` Linus Torvalds 0 siblings, 2 replies; 31+ messages in thread From: Denis Bueno @ 2008-06-10 22:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Tue, Jun 10, 2008 at 17:48, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Tue, 10 Jun 2008, Denis Bueno wrote: >> > >> > then *if* you have the files >> > >> > .git/objects/32/0bd6e82267b71dd2ca7043ea3f61dbbca16109 >> > .git/objects/4d/0be2816d5eea5ae2b40990235e2225c1715927 >> > >> > then those two files are interesting in themselves (most likely they are >> > not there at all, or are zero-sized, but if you have them, please post >> > them). >> >> They are attached, and they are not zero-sized. > > Very interesting. > > Both of them look fairly sane as objects (ie random - it's supposed to eb > zlib-compressed), but both of them have the first 512 bytes *identically* > corrupted: > > 0000000 6564 626e 6575 406e 6f64 6f72 6874 2e79 > d e n b u e n @ d o r o t h y . > 0000020 6f6c 6163 2e6c 3634 0033 0000 0000 0000 > l o c a l . 4 6 3 \0 \0 \0 \0 \0 \0 \0 > 0000040 0000 0000 0000 0000 0000 0000 0000 0000 > \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 > * > > ie it's an all-zero block, except for that email-looking thing at the > head. Right --- that's my username and computer's hostname... for some reason. [You are not expected to understand this. My computer's name mysteriously changed. It should not be "dorothy.local" but it is. I will have to find out why....] > One thign that strikes me is that you seem to be really prone to this > problem, since it happened to you a year ago too. I cannot swear to this, > but I literally suspect your last case (July-2007) was the previous time > we had a corruption issue. Why does it seem to happen to you, but not > others? It is the same computer on which the problem occurred last time. It's an OS X 10.4 macbook pro. I haven't noticed corruption in other places, but it's fair to assume it's occurring. I'll have to boot off my install disk and fsck the drive.... > Do you have some odd filesystem in play? Was the current corruption in a > similar environment as the old one? IOW, I'm trying to find a pattern > here, to see if there might be something we can do about it.. I can't remember if the old one happened after a panic or not, but I'd bet it did. The filesystem is HFS+, as indeed most OS X 10.4 installations are. Maybe the HD has been going south? However, that doesn't seem likely, since when I got the computer it was new, and that was around Jun 2007. > But it *sounds* like the objects you lost were literally old ones, no? Ie > the lost stuff wasn't something you had committed in the last five minutes > or so? If so, then you really do seem to have a filesystem that corrupts > *old* files when it crashes. That's fairly scary. What FS is it? No, in fact I had just committed those changes not 10 minutes before the panic. Last time they were also fresh changes, although perhaps older than 10 minutes. I can't remember. -- Denis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 22:09 ` Denis Bueno @ 2008-06-10 22:25 ` Tarmigan 2008-06-10 22:41 ` Denis Bueno 2008-06-10 22:45 ` Linus Torvalds 1 sibling, 1 reply; 31+ messages in thread From: Tarmigan @ 2008-06-10 22:25 UTC (permalink / raw) To: Denis Bueno; +Cc: Linus Torvalds, Git Mailing List On Tue, Jun 10, 2008 at 3:09 PM, Denis Bueno <dbueno@gmail.com> wrote: > It is the same computer on which the problem occurred last time. It's > an OS X 10.4 macbook pro. I haven't noticed corruption in other > places, but it's fair to assume it's occurring. I'll have to boot off > my install disk and fsck the drive.... Do you have fink installed? Do you have the openssl fink package installed? Vger seems to have swallowed my original reply, but see this thread: http://marc.info/?l=git&m=120787191106549&w=2 If so, try removing the fink openssl packages and reinstalling git. Do you push from this machine often? If you do, then this probably is not your problem as you would have seen it earlier. -Tarmigan ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 22:25 ` Tarmigan @ 2008-06-10 22:41 ` Denis Bueno 0 siblings, 0 replies; 31+ messages in thread From: Denis Bueno @ 2008-06-10 22:41 UTC (permalink / raw) To: Tarmigan; +Cc: Linus Torvalds, Git Mailing List On Tue, Jun 10, 2008 at 18:25, Tarmigan <tarmigan+git@gmail.com> wrote: > Do you have fink installed? Do you have the openssl fink package > installed? Vger seems to have swallowed my original reply, but see > this thread: > http://marc.info/?l=git&m=120787191106549&w=2 > If so, try removing the fink openssl packages and reinstalling git. I use macports. > Do you push from this machine often? If you do, then this probably is > not your problem as you would have seen it earlier. Yes, almost exclusively. ... That is an odd problem. Thanks for the suggestion. -- Denis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 22:09 ` Denis Bueno 2008-06-10 22:25 ` Tarmigan @ 2008-06-10 22:45 ` Linus Torvalds 2008-06-10 23:00 ` Linus Torvalds 2008-06-11 0:43 ` Nicolas Pitre 1 sibling, 2 replies; 31+ messages in thread From: Linus Torvalds @ 2008-06-10 22:45 UTC (permalink / raw) To: Denis Bueno; +Cc: Git Mailing List On Tue, 10 Jun 2008, Denis Bueno wrote: > > > Do you have some odd filesystem in play? Was the current corruption in a > > similar environment as the old one? IOW, I'm trying to find a pattern > > here, to see if there might be something we can do about it.. > > I can't remember if the old one happened after a panic or not, but I'd > bet it did. The filesystem is HFS+, as indeed most OS X 10.4 > installations are. Maybe the HD has been going south? However, that > doesn't seem likely, since when I got the computer it was new, and > that was around Jun 2007. Yeah, it's almost certainly not the disk. Disks do go bad, but the behavior tends to be rather different when they do (usually you will get read errors with uncorrectably CRC failures, and you'd know that _very_ clearly). Sure, I could imagine something like the sector remapping could be flaking out on you, but that sounds really unlikely. Especially since: > > But it *sounds* like the objects you lost were literally old ones, no? Ie > > the lost stuff wasn't something you had committed in the last five minutes > > or so? If so, then you really do seem to have a filesystem that corrupts > > *old* files when it crashes. That's fairly scary. What FS is it? > > No, in fact I had just committed those changes not 10 minutes before > the panic. Last time they were also fresh changes, although perhaps > older than 10 minutes. I can't remember. Oh, ok. If so, then this is much less worrisome, and is in fact almost "normal" HFS+ behaviour. It is a journaling filesystem, but it only journals metadata, so the filenames and inodes will be fine after a crash, but the contents will be random. [ Yeah, yeah, I know - it sounds rather stupid, but it's a common kind of stupidity. The journaling essentially protects the only thing that fsck can find. Ext3 does similar things in "writeback" mode - but you should use "data=ordered" which writes out the data before metadata. Basically, such journaling doesn't help data integrity per se, but it does mean that the metadata is ok, and that in turn means that while the file contents won't be dependable, at least things like free block bitmaps etc hopefully are. That in turn hopefully means that new file allocations won't be crapping out all over old ones etc due to bad resource allocations, so while it doesn't mean that the data is trust-worthy, it at least means that you can trust _some_ things ] If your machine crashes often, you could trivially add a "sync" to your commit hook. That would make things better. And maybe we should have a "safe mode" that does these things more carefully. You would definitely want to turn it on on that machine. Are you doing something special to make the machine crash so much? Or do OS X machines always crash, and Apple PR is just so good that people aren't aware of it? Anyway, I'll think about sane ways to add a "safe" mode without making it _too_ painful. In the meantime, here's a trial patch that you should probably use. It does slow things down, but hopefully not too much. (I really don't much like it - but I think this is a good change, and I just need to come up with a better way to do the fsync() than to be totally synchronous about it.) It's going to make big "git add" calls *much* slower, so I'm not very happy about it (especially since we don't actually care that deeply about the files really being there until much later, so doing something asynchronous would be perfectly acceptable), but for you this is definitely worth-while. Linus --- sha1_file.c | 17 +++++++++++------ 1 files changed, 11 insertions(+), 6 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index adcf37c..86a653b 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2105,6 +2105,15 @@ int hash_sha1_file(const void *buf, unsigned long len, const char *type, return 0; } +/* Finalize a file on disk, and close it. */ +static void close_sha1_file(int fd) +{ + fsync_or_die(fd, "sha1 file"); + fchmod(fd, 0444); + if (close(fd) != 0) + die("unable to write sha1 file"); +} + static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen, void *buf, unsigned long len, time_t mtime) { @@ -2170,9 +2179,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen, if (write_buffer(fd, compressed, size) < 0) die("unable to write sha1 file"); - fchmod(fd, 0444); - if (close(fd)) - die("unable to write sha1 file"); + close_sha1_file(fd); free(compressed); if (mtime) { @@ -2350,9 +2357,7 @@ int write_sha1_from_fd(const unsigned char *sha1, int fd, char *buffer, } while (1); inflateEnd(&stream); - fchmod(local, 0444); - if (close(local) != 0) - die("unable to write sha1 file"); + close_sha1_file(local); SHA1_Final(real_sha1, &c); if (ret != Z_STREAM_END) { unlink(tmpfile); ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 22:45 ` Linus Torvalds @ 2008-06-10 23:00 ` Linus Torvalds 2008-06-11 0:43 ` Nicolas Pitre 1 sibling, 0 replies; 31+ messages in thread From: Linus Torvalds @ 2008-06-10 23:00 UTC (permalink / raw) To: Denis Bueno; +Cc: Git Mailing List On Tue, 10 Jun 2008, Linus Torvalds wrote: > > It's going to make big "git add" calls *much* slower, so I'm not very > happy about it (especially since we don't actually care that deeply about > the files really being there until much later, so doing something > asynchronous would be perfectly acceptable), but for you this is > definitely worth-while. For me, on the whole kernel, on a pretty good system: - before: [torvalds@woody test-it-out]$ time git add . real 0m7.986s user 0m6.404s sys 0m1.456s - after: [torvalds@woody test-it-out]$ time ~/git/git-add . real 0m52.693s user 0m7.416s sys 0m2.516s so it's definitely quite noticeable in that simplistic form. A more interesting patch would use aio_fsync(), and then just wait for them at the end with aio_return(). Not that I love AIO, but this is definitely a case where it would make sense to do (of course, systems without AIO support would then fall back to regular fsync()). I will have to think about this. Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 22:45 ` Linus Torvalds 2008-06-10 23:00 ` Linus Torvalds @ 2008-06-11 0:43 ` Nicolas Pitre 2008-06-11 1:39 ` Linus Torvalds 1 sibling, 1 reply; 31+ messages in thread From: Nicolas Pitre @ 2008-06-11 0:43 UTC (permalink / raw) To: Linus Torvalds; +Cc: Denis Bueno, Git Mailing List On Tue, 10 Jun 2008, Linus Torvalds wrote: > Anyway, I'll think about sane ways to add a "safe" mode without making it > _too_ painful. In the meantime, here's a trial patch that you should > probably use. It does slow things down, but hopefully not too much. > > (I really don't much like it - but I think this is a good change, and I > just need to come up with a better way to do the fsync() than to be > totally synchronous about it.) > > It's going to make big "git add" calls *much* slower, so I'm not very > happy about it (especially since we don't actually care that deeply about > the files really being there until much later, so doing something > asynchronous would be perfectly acceptable), but for you this is > definitely worth-while. I don't like it at all. I think this only gives a false sense of security with a huge performance cost. If the machine crashes at the right moment, the object will still be half written/fsync'd and you'll be in the same situation again. And because we don't overwrite existing objects (again for performance reasons), then a corrupted blob object will remain corrupted even if you reattempt the commit later. So doing the fsync only when the commit object is written isn't a good solution either. I wonder if supporting crashy systems is worth that cost. If Denis' laptop is the odd case then a sync in the commit hook might be plenty sufficient. Personally I'd simply replace the OS or the machine for something more reliable. Nicolas ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-11 0:43 ` Nicolas Pitre @ 2008-06-11 1:39 ` Linus Torvalds 2008-06-11 1:47 ` Nicolas Pitre 0 siblings, 1 reply; 31+ messages in thread From: Linus Torvalds @ 2008-06-11 1:39 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Denis Bueno, Git Mailing List On Tue, 10 Jun 2008, Nicolas Pitre wrote: > > I think this only gives a false sense of security with a huge > performance cost. If the machine crashes at the right moment, the > object will still be half written/fsync'd and you'll be in the same > situation again. No you wouldn't. We do the write and the fsync() of the write to a _temporary_ filename. We do the rename _after_ the fsync. So you'd never have a half-written object file. That said, I do agree that the bigger problem is that Denis' machine is simply so unreliable. Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-11 1:39 ` Linus Torvalds @ 2008-06-11 1:47 ` Nicolas Pitre 0 siblings, 0 replies; 31+ messages in thread From: Nicolas Pitre @ 2008-06-11 1:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: Denis Bueno, Git Mailing List On Tue, 10 Jun 2008, Linus Torvalds wrote: > > > On Tue, 10 Jun 2008, Nicolas Pitre wrote: > > > > I think this only gives a false sense of security with a huge > > performance cost. If the machine crashes at the right moment, the > > object will still be half written/fsync'd and you'll be in the same > > situation again. > > No you wouldn't. > > We do the write and the fsync() of the write to a _temporary_ filename. We > do the rename _after_ the fsync. Ah, true. That part somehow evaded my mind. Nicolas ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 21:09 ` Linus Torvalds 2008-06-10 21:22 ` Denis Bueno @ 2008-06-10 21:27 ` Denis Bueno 2008-06-10 22:52 ` Junio C Hamano 2008-06-11 23:21 ` To graft or not to graft... (Re: Recovering from repository corruption) Stephen R. van den Berg 3 siblings, 0 replies; 31+ messages in thread From: Denis Bueno @ 2008-06-10 21:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Tue, Jun 10, 2008 at 17:09, Linus Torvalds <torvalds@linux-foundation.org> wrote: > Ahh, ok. Yes, we should probably re-think our 'grafts' file thing, or at > least not document it, because it's actually a wondeful way to just cause > more corruption by hiding things (ie if you clone a repo with a grafts > file, the result will now have neither the grafts file _nor_ the state > that was hidden by it, so the result is guaranteed to be corrupt). I'd argue in favor of documenting it, even if it's dangerous, unless there's some other mechanism (rebase?) that would let me do what I did? That is, to recover from corruption in a way that lets me regenerate or ignore inexact, corrupted commits. -- Denis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 21:09 ` Linus Torvalds 2008-06-10 21:22 ` Denis Bueno 2008-06-10 21:27 ` Denis Bueno @ 2008-06-10 22:52 ` Junio C Hamano 2008-06-11 23:21 ` To graft or not to graft... (Re: Recovering from repository corruption) Stephen R. van den Berg 3 siblings, 0 replies; 31+ messages in thread From: Junio C Hamano @ 2008-06-10 22:52 UTC (permalink / raw) To: Linus Torvalds; +Cc: Denis Bueno, Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > Ahh, ok. Yes, we should probably re-think our 'grafts' file thing, or at > least not document it, because it's actually a wondeful way to just cause > more corruption by hiding things (ie if you clone a repo with a grafts > file, the result will now have neither the grafts file _nor_ the state > that was hidden by it, so the result is guaranteed to be corrupt). "Graft and then clone" will not make the copied repository Ok. You need to propagate the graft in some other way. However, "Graft and then filter-branch" is a way to hide and get rid of the the broken thing in history etched in the objects. After that the repository itself and a clone from it will not need the graft. So I'd rather argue we should document it _differently_ (or just _better_) than not document it. ^ permalink raw reply [flat|nested] 31+ messages in thread
* To graft or not to graft... (Re: Recovering from repository corruption) 2008-06-10 21:09 ` Linus Torvalds ` (2 preceding siblings ...) 2008-06-10 22:52 ` Junio C Hamano @ 2008-06-11 23:21 ` Stephen R. van den Berg 2008-06-11 23:34 ` Jakub Narebski 2008-06-11 23:39 ` Linus Torvalds 3 siblings, 2 replies; 31+ messages in thread From: Stephen R. van den Berg @ 2008-06-11 23:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: Denis Bueno, Git Mailing List Linus Torvalds wrote: >more corruption by hiding things (ie if you clone a repo with a grafts >file, the result will now have neither the grafts file _nor_ the state >that was hidden by it, so the result is guaranteed to be corrupt). This is kind of confusing. As I understood it from the few shreds of documentation that actually mention the grafts file, the grafts file is *not* being cloned. Therefore, my assumption was that cloning a repository that has a grafts file gives an identical result to cloning the same repository *without* the grafts file present. As I understand it now, the cloning process actually peeks at the grafts file while cloning, and then doesn't copy it. This results in a rather confusingly corrupt clone. I suggest two things: a. That during the cloning process, the grafts file is completely disregarded in any case at first. b. Preferably the grafts file is copied as well (after cloning). I never really understood why the file is not being copied in the first place (anyone care to explain that?). -- Sincerely, Stephen R. van den Berg. Differentiation is an integral part of calculus. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: To graft or not to graft... (Re: Recovering from repository corruption) 2008-06-11 23:21 ` To graft or not to graft... (Re: Recovering from repository corruption) Stephen R. van den Berg @ 2008-06-11 23:34 ` Jakub Narebski 2008-06-11 23:39 ` Linus Torvalds 1 sibling, 0 replies; 31+ messages in thread From: Jakub Narebski @ 2008-06-11 23:34 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Linus Torvalds, Denis Bueno, Git Mailing List "Stephen R. van den Berg" <srb@cuci.nl> writes: > This is kind of confusing. > > As I understood it from the few shreds of documentation that actually > mention the grafts file, the grafts file is *not* being cloned. > Therefore, my assumption was that cloning a repository that has a grafts > file gives an identical result to cloning the same repository *without* > the grafts file present. > > As I understand it now, the cloning process actually peeks at the grafts > file while cloning, and then doesn't copy it. This results in a rather > confusingly corrupt clone. > > I suggest two things: > a. That during the cloning process, the grafts file is completely > disregarded in any case at first. > b. Preferably the grafts file is copied as well (after cloning). I > never really understood why the file is not being copied in the first > place (anyone care to explain that?). A bit of explanation: initially I think grafts were created as a means to "graft" historical repository (conversion from BitKeeper and from patches) to current work repository (from when git was deemed suitable as SCM for Linux kernel development). Nevertheless the machenism is generic enough to change history _locally_ in many strange ways (for example shallow clone uses kind of grafts). Because graft file can be used to alter history, this totally _bypases_ the check given by sha1 of commit and cryptographically signed tags. It negates security given by sha-1 signing. That's why using grafs must be _conscious_ decision - therefore they are purely local and not propagated. (Also there were no place for grafts in the "smart" trasport, i.e. git and ssh protocols. Thinking about what happens if both sides have grafs files which differ...) On the other hand history _without_ grafts might not validate. I think that it is why current confusing behavior... -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: To graft or not to graft... (Re: Recovering from repository corruption) 2008-06-11 23:21 ` To graft or not to graft... (Re: Recovering from repository corruption) Stephen R. van den Berg 2008-06-11 23:34 ` Jakub Narebski @ 2008-06-11 23:39 ` Linus Torvalds 2008-06-12 7:14 ` Johan Herland 1 sibling, 1 reply; 31+ messages in thread From: Linus Torvalds @ 2008-06-11 23:39 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Denis Bueno, Git Mailing List On Thu, 12 Jun 2008, Stephen R. van den Berg wrote: > > As I understood it from the few shreds of documentation that actually > mention the grafts file, the grafts file is *not* being cloned. > Therefore, my assumption was that cloning a repository that has a grafts > file gives an identical result to cloning the same repository *without* > the grafts file present. That would probably be the right behaviour, but no - all our commit walkers honor the grafts file. Including the ones used for creating pack-files and thus a clone. > As I understand it now, the cloning process actually peeks at the grafts > file while cloning, and then doesn't copy it. This results in a rather > confusingly corrupt clone. Yes. The grafts-file was a mistake, but it's just barely useful to some people that it's stayed alive. Sadly, those "some people" don't tend to care enough about the problems it can cause. > I suggest two things: > a. That during the cloning process, the grafts file is completely > disregarded in any case at first. Yes. And (a'): git-fsck and repacking should just consider it to be an _additional_ source of parenthood rather than a _replacement_ source. > b. Preferably the grafts file is copied as well (after cloning). I > never really understood why the file is not being copied in the first > place (anyone care to explain that?). The grafts file isn't part of the object stream and refs, and clones (and fetches) very much just copy the object database. Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: To graft or not to graft... (Re: Recovering from repository corruption) 2008-06-11 23:39 ` Linus Torvalds @ 2008-06-12 7:14 ` Johan Herland 2008-06-12 7:47 ` Jeff King 0 siblings, 1 reply; 31+ messages in thread From: Johan Herland @ 2008-06-12 7:14 UTC (permalink / raw) To: git; +Cc: Linus Torvalds, Stephen R. van den Berg, Denis Bueno On Thursday 12 June 2008, Linus Torvalds wrote: > On Thu, 12 Jun 2008, Stephen R. van den Berg wrote: > > As I understood it from the few shreds of documentation that actually > > mention the grafts file, the grafts file is *not* being cloned. > > Therefore, my assumption was that cloning a repository that has a > > grafts file gives an identical result to cloning the same repository > > *without* the grafts file present. > > That would probably be the right behaviour, but no - all our commit > walkers honor the grafts file. > > Including the ones used for creating pack-files and thus a clone. > > > As I understand it now, the cloning process actually peeks at the > > grafts file while cloning, and then doesn't copy it. This results in a > > rather confusingly corrupt clone. > > Yes. The grafts-file was a mistake, but it's just barely useful to some > people that it's stayed alive. Sadly, those "some people" don't tend to > care enough about the problems it can cause. > > > I suggest two things: > > a. That during the cloning process, the grafts file is completely > > disregarded in any case at first. > > Yes. > > And (a'): git-fsck and repacking should just consider it to be an > _additional_ source of parenthood rather than a _replacement_ source. > > > b. Preferably the grafts file is copied as well (after cloning). I > > never really understood why the file is not being copied in the > > first place (anyone care to explain that?). > > The grafts file isn't part of the object stream and refs, and clones (and > fetches) very much just copy the object database. AFAICS, there's already a perfectly fine way to distribute grafted history: 1. Add a grafts file 2. Run git-filter-branch 3. Remove grafts file 4. Distribute repo 5. Profit! Since git-filter-branch turns grafted parentage into _real_ parentage, there's no point in ever having a grafts file at all (except transiently for telling git-filter-branch what to do). I suggest we make commit walkers NOT obey the grafts file by default, but instead require a --follow-grafts option to restore the current behaviour. Then, we teach git-filter-branch to obey the grafts file (probably by employing said --follow-grafts option). For those who want to hang on to the current behaviour, they can create some config option that is equivalent to always running with --follow-grafts. The following is ugly, untested, undocumented, and obviously unfit for inclusion: diff --git a/commit.c b/commit.c index 94d5b3d..3e9ebf7 100644 --- a/commit.c +++ b/commit.c @@ -7,6 +7,7 @@ #include "revision.h" int save_commit_buffer = 1; +int use_grafts = 0; const char *commit_type = "commit"; @@ -242,7 +243,7 @@ int parse_commit_buffer(struct commit *item, void *buffer, unsigned long size) char *bufptr = buffer; unsigned char parent[20]; struct commit_list **pptr; - struct commit_graft *graft; + struct commit_graft *graft = NULL; unsigned n_refs = 0; if (item->object.parsed) @@ -260,7 +261,8 @@ int parse_commit_buffer(struct commit *item, void *buffer, unsigned long size) bufptr += 46; /* "tree " + "hex sha1" + "\n" */ pptr = &item->parents; - graft = lookup_commit_graft(item->object.sha1); + if (use_grafts) + graft = lookup_commit_graft(item->object.sha1); while (bufptr + 48 < tail && !memcmp(bufptr, "parent ", 7)) { struct commit *new_parent; diff --git a/commit.h b/commit.h index 2d94d41..3e30aa0 100644 --- a/commit.h +++ b/commit.h @@ -22,6 +22,7 @@ struct commit { }; extern int save_commit_buffer; +extern int use_grafts; extern const char *commit_type; /* While we can decorate any object with a name, it's only used for commits.. */ diff --git a/git-filter-branch.sh b/git-filter-branch.sh index d04c346..5ebe7cd 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -230,11 +230,11 @@ mkdir ../map || die "Could not create map/ directory" case "$filter_subdir" in "") git rev-list --reverse --topo-order --default HEAD \ - --parents "$@" + --follow-grafts --parents "$@" ;; *) git rev-list --reverse --topo-order --default HEAD \ - --parents "$@" -- "$filter_subdir" + --follow-grafts --parents "$@" -- "$filter_subdir" esac > ../revs || die "Could not get the commits" commits=$(wc -l <../revs | tr -d " ") diff --git a/revision.c b/revision.c index 5a1a948..ca98815 100644 --- a/revision.c +++ b/revision.c @@ -1059,6 +1059,10 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch revs->first_parent_only = 1; continue; } + if (!strcmp(arg, "--follow-grafts")) { + use_grafts = 1; + continue; + } if (!strcmp(arg, "--reflog")) { handle_reflog(revs, flags); continue; -- 1.5.6.rc2.128.gf64ae Have fun! :) ...Johan -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: To graft or not to graft... (Re: Recovering from repository corruption) 2008-06-12 7:14 ` Johan Herland @ 2008-06-12 7:47 ` Jeff King 2008-06-12 10:21 ` Johan Herland 0 siblings, 1 reply; 31+ messages in thread From: Jeff King @ 2008-06-12 7:47 UTC (permalink / raw) To: Johan Herland; +Cc: git, Linus Torvalds, Stephen R. van den Berg, Denis Bueno On Thu, Jun 12, 2008 at 09:14:21AM +0200, Johan Herland wrote: > > The grafts file isn't part of the object stream and refs, and clones (and > > fetches) very much just copy the object database. > > AFAICS, there's already a perfectly fine way to distribute grafted history: > 1. Add a grafts file > 2. Run git-filter-branch > 3. Remove grafts file > 4. Distribute repo > 5. Profit! > > Since git-filter-branch turns grafted parentage into _real_ parentage, > there's no point in ever having a grafts file at all (except transiently > for telling git-filter-branch what to do). But then you have rewritten all of the later commits, so you can no longer talk to other people about them. The kernel repo is split into "historical" and active repos. You can graft the historical repo and get more far-reaching answers to things like "git log" and "git blame". But if you run filter-branch, you can't share development on that repo via push / pull to people who _don't_ use the graft, since they don't share your history (and they probably don't want to, because of the extra resources required to pull in the historical chunk). That being said, I don't know how common such a setup is. And you did mention a "follow-grafts" config option for such people. -Peff ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: To graft or not to graft... (Re: Recovering from repository corruption) 2008-06-12 7:47 ` Jeff King @ 2008-06-12 10:21 ` Johan Herland 2008-06-12 12:20 ` Stephen R. van den Berg 0 siblings, 1 reply; 31+ messages in thread From: Johan Herland @ 2008-06-12 10:21 UTC (permalink / raw) To: Jeff King; +Cc: git, Linus Torvalds, Stephen R. van den Berg, Denis Bueno On Thursday 12 June 2008, Jeff King wrote: > On Thu, Jun 12, 2008 at 09:14:21AM +0200, Johan Herland wrote: > > > The grafts file isn't part of the object stream and refs, and > > > clones (and fetches) very much just copy the object database. > > > > AFAICS, there's already a perfectly fine way to distribute grafted > > history: 1. Add a grafts file > > 2. Run git-filter-branch > > 3. Remove grafts file > > 4. Distribute repo > > 5. Profit! > > > > Since git-filter-branch turns grafted parentage into _real_ > > parentage, there's no point in ever having a grafts file at all > > (except transiently for telling git-filter-branch what to do). > > But then you have rewritten all of the later commits, so you can no > longer talk to other people about them. Correct. My point is that if you want to talk to people about revisions, you'd better do it from a repo where people agree on the entire history. On the other hand, if you want to do archaeology with grafts, you should be aware that you are subverting one of the core guarantees provided by Git (i.e. a commit id verifies full ancestry of a commit), and therefore shouldn't communicate with other repos _at_ _all_, as other repos can easily be confused (see [1]). > The kernel repo is split into "historical" and active repos. You can > graft the historical repo and get more far-reaching answers to things > like "git log" and "git blame". But if you run filter-branch, you > can't share development on that repo via push / pull to people who > _don't_ use the graft, since they don't share your history (and they > probably don't want to, because of the extra resources required to > pull in the historical chunk). Yes, by forcing git-filter-branch, you can no longer push/pull to/from such a historical repo. But as this thread has already demonstrated, with grafts you can't clone from such a repo today (nor pull in certain circumstances, see [1]); so the way I see it, communication with this repo is _already_ limited. By disallowing grafts and forcing a rewrite of the entire repo, we force these communication problems to be more explicit/visible. > That being said, I don't know how common such a setup is. And you did > mention a "follow-grafts" config option for such people. Indeed. :) AFAICS, there's two use cases for grafts: 1. As a preparation for rewriting the history with git-filter-branch. 2. For providing historical repos (like you mention above). My suggestion only makes life harder for people in the second use case. If there are many people in the second use case, and they deem the "follow-grafts" config option unacceptable, I expect them to flame my suggestion to a crisp, and we'll have to think of something else... Have fun! :) ...Johan [1]: Consider the following: ### Create a repo with one commit, A $ mkdir foo $ cd foo $ git init Initialized empty Git repository in /path/to/foo/.git/ $ echo foo > foo $ git add foo $ git commit -mA Created initial commit fe2ec02: A 1 files changed, 1 insertions(+), 0 deletions(-) create mode 100644 foo ### Clone the repo $ cd .. $ git clone /path/to/foo bar Initialize bar/.git Initialized empty Git repository in /path/to/bar/.git/ ### Create 3 more commits in the original repo: A---B---C---D $ cd foo $ echo bar >> foo && git commit -a -mB Created commit ad10f00: B 1 files changed, 1 insertions(+), 0 deletions(-) $ echo baz >> foo && git commit -a -mC Created commit be96559: C 1 files changed, 1 insertions(+), 0 deletions(-) $ echo xyzzy >> foo && git commit -a -mD Created commit f2bafe5: D 1 files changed, 1 insertions(+), 0 deletions(-) ### Create a graft removing C from the history: A---B---D $ echo "f2bafe58175e132077285e7fbbcec30859101d2e \ ad10f005205f61429dccda95e1442dabe31fbfbe" > .git/info/grafts ### Pull the recent changes into the clone $ cd ../bar $ git pull remote: Counting objects: 8, done. remote: Compressing objects: 100% (2/2), done. Unpacking objects: 100% (6/6), done. remote: Total 6 (delta 0), reused 0 (delta 0) error: Could not read be965599d99192f624b8d8bbf3cab412872586fc From /path/to/foo/ + fe2ec02...f2bafe5 master -> origin/master (forced update) error: Could not read be965599d99192f624b8d8bbf3cab412872586fc error: Could not read be965599d99192f624b8d8bbf3cab412872586fc Auto-merged foo CONFLICT (add/add): Merge conflict in foo Automatic merge failed; fix conflicts and then commit the result. AFAICS, git-pull can easily become just as confused by grafts as git-clone. I wouldn't be surprised by a similar example for git-push. I can only draw the conclusion that with current versions of Git, repos with grafts should _never_ be made public. -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: To graft or not to graft... (Re: Recovering from repository corruption) 2008-06-12 10:21 ` Johan Herland @ 2008-06-12 12:20 ` Stephen R. van den Berg 0 siblings, 0 replies; 31+ messages in thread From: Stephen R. van den Berg @ 2008-06-12 12:20 UTC (permalink / raw) To: Johan Herland; +Cc: Jeff King, git, Linus Torvalds, Denis Bueno Johan Herland wrote: >I can only draw the conclusion that with current versions of Git, repos >with grafts should _never_ be made public. Correct. I still prefer my original suggestion, i.e. allow repos with grafts to be cloned, yet disregard the grafts during the cloning process. The trouble is that with your suggestion, it becomes a bit convoluted when grafts are being used and when not. It already is complicated as it is, so I suggest we try and keep git honest so that it does exactly what one would expect (instead of documenting awkward behaviour). As soon as time permits, I'll submit appropriate patches to implement this, as well as some other sanity check patches which I've been contemplating to help the grafter detect "bad" grafts as early as possible. -- Sincerely, Stephen R. van den Berg. "Always look on the bright side of life!" ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 17:26 Recovering from repository corruption Denis Bueno 2008-06-10 17:55 ` Jakub Narebski @ 2008-06-10 19:40 ` Nicolas Pitre 2008-06-10 19:42 ` Denis Bueno 1 sibling, 1 reply; 31+ messages in thread From: Nicolas Pitre @ 2008-06-10 19:40 UTC (permalink / raw) To: Denis Bueno; +Cc: Git Mailing List On Tue, 10 Jun 2008, Denis Bueno wrote: > I started a thread a while back about repository corruption. It > manifested as a clone error and the thread is here: > > http://kerneltrap.org/mailarchive/git/2007/7/31/253475 > > I just ran, again, into corruption after my laptop kernel-panic'd. > (Ironically, at the moment I ran into the corruption I was trying to > push my repo to a backup location.) Since that thread took place it > seems a section about recovering from repo corruption was added to the > manual --- but it assumes you can (or care to painstakingly) recreate > each corrupted version. Would you happen, by chance, to have another instance of that repository somewhere else with the concerned objects in it? Nicolas ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Recovering from repository corruption 2008-06-10 19:40 ` Recovering from repository corruption Nicolas Pitre @ 2008-06-10 19:42 ` Denis Bueno 0 siblings, 0 replies; 31+ messages in thread From: Denis Bueno @ 2008-06-10 19:42 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Git Mailing List On Tue, Jun 10, 2008 at 15:40, Nicolas Pitre <nico@cam.org> wrote: >> (Ironically, at the moment I ran into the corruption I was trying to >> push my repo to a backup location.) > > Would you happen, by chance, to have another instance of that repository > somewhere else with the concerned objects in it? Nope. I was *just* about to back it up. -- Denis ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2008-06-12 12:21 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-10 17:26 Recovering from repository corruption Denis Bueno 2008-06-10 17:55 ` Jakub Narebski 2008-06-10 19:38 ` Denis Bueno 2008-06-10 19:59 ` Jakub Narebski 2008-06-10 20:03 ` Denis Bueno 2008-06-10 20:14 ` Jakub Narebski 2008-06-10 20:35 ` Denis Bueno 2008-06-10 20:23 ` Linus Torvalds 2008-06-10 20:28 ` Denis Bueno 2008-06-10 21:09 ` Linus Torvalds 2008-06-10 21:22 ` Denis Bueno 2008-06-10 21:48 ` Linus Torvalds 2008-06-10 22:09 ` Denis Bueno 2008-06-10 22:25 ` Tarmigan 2008-06-10 22:41 ` Denis Bueno 2008-06-10 22:45 ` Linus Torvalds 2008-06-10 23:00 ` Linus Torvalds 2008-06-11 0:43 ` Nicolas Pitre 2008-06-11 1:39 ` Linus Torvalds 2008-06-11 1:47 ` Nicolas Pitre 2008-06-10 21:27 ` Denis Bueno 2008-06-10 22:52 ` Junio C Hamano 2008-06-11 23:21 ` To graft or not to graft... (Re: Recovering from repository corruption) Stephen R. van den Berg 2008-06-11 23:34 ` Jakub Narebski 2008-06-11 23:39 ` Linus Torvalds 2008-06-12 7:14 ` Johan Herland 2008-06-12 7:47 ` Jeff King 2008-06-12 10:21 ` Johan Herland 2008-06-12 12:20 ` Stephen R. van den Berg 2008-06-10 19:40 ` Recovering from repository corruption Nicolas Pitre 2008-06-10 19:42 ` Denis Bueno
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).