* Schrödinger's diff @ 2009-07-07 6:53 Eric Raible 2009-07-07 7:28 ` Johannes Sixt ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Eric Raible @ 2009-07-07 6:53 UTC (permalink / raw) To: Git Mailing List git version 1.6.3.2.1299.gee46c (msysgit) In trying to track down some annoying crlf corruption in a repo I have found a Schrödinger's diff. In other words it's unknown whether the diff will produce output or not on any particular run of the following script. Sometimes it does, and sometimes it doesn't (seems to be about 50/50). But either way in any given repo rerunning the git-diff will always give the same result. Doing an "git ls-tree HEAD" gives an identical tree in both cases. Can anyone explain why the output to this is not deterministic? I'm at a complete loss. # Clean up from last run and start over rm -rf .git has-crlf git init git config core.autocrlf false # Add a "bad" file perl -e 'printf( "12%c%c", 0xd, 0xa )' > has-crlf git add has-crlf git commit -m"add crlf" # I realize that switching is ill-advised, but I'm # trying to track down a possibly related problem... git config core.autocrlf true # This sometimes produces output and sometimes it doesn't. # Either way rerunning just git-diff always gives the same result # as the first run in this repo. git diff - Eric ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Schrödinger's diff 2009-07-07 6:53 Schrödinger's diff Eric Raible @ 2009-07-07 7:28 ` Johannes Sixt 2009-07-07 7:52 ` Andreas Ericsson 2009-07-07 17:36 ` Daniel Barkalow 2 siblings, 0 replies; 13+ messages in thread From: Johannes Sixt @ 2009-07-07 7:28 UTC (permalink / raw) To: Eric Raible; +Cc: Git Mailing List Eric Raible schrieb: > Sometimes it does, and sometimes it doesn't (seems to be about > 50/50). But either way in any given repo rerunning the git-diff will > always give the same result. > > Doing an "git ls-tree HEAD" gives an identical tree in both cases. > > Can anyone explain why the output to this is not deterministic? > I'm at a complete loss. > > # Clean up from last run and start over > rm -rf .git has-crlf > git init > git config core.autocrlf false > > # Add a "bad" file > perl -e 'printf( "12%c%c", 0xd, 0xa )' > has-crlf > git add has-crlf > git commit -m"add crlf" > > # I realize that switching is ill-advised, but I'm > # trying to track down a possibly related problem... > git config core.autocrlf true > > # This sometimes produces output and sometimes it doesn't. > # Either way rerunning just git-diff always gives the same result > # as the first run in this repo. > git diff If I put this in a script, I get a diff in 9 out of 10 runs. If I insert 'sleep 1' right before the 'git add', I never get a diff. I'm handing this off to people who care about core.autocrlf and who know how racily-clean index entries (not) work ;) -- Hannes ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Schrödinger's diff 2009-07-07 6:53 Schrödinger's diff Eric Raible 2009-07-07 7:28 ` Johannes Sixt @ 2009-07-07 7:52 ` Andreas Ericsson 2009-07-07 17:36 ` Daniel Barkalow 2 siblings, 0 replies; 13+ messages in thread From: Andreas Ericsson @ 2009-07-07 7:52 UTC (permalink / raw) To: Eric Raible; +Cc: Git Mailing List Eric Raible wrote: > git version 1.6.3.2.1299.gee46c (msysgit) > > In trying to track down some annoying crlf corruption in a repo > I have found a Schrödinger's diff. In other words it's unknown > whether the diff will produce output or not on any particular run > of the following script. > > Sometimes it does, and sometimes it doesn't (seems to be about > 50/50). But either way in any given repo rerunning the git-diff will > always give the same result. > I don't get the same result in the same repo, although it only differs in 1-1.5% of the tests. > Doing an "git ls-tree HEAD" gives an identical tree in both cases. > > Can anyone explain why the output to this is not deterministic? On Linux with git version 1.6.3.3.354.g3b4cc Pasting your commands into "repro.sh", but redirecting output from git commit to /dev/null, and then running the following commands has yielded 9 to 15 sample.$i files over 5 tries of the following: sh repro.sh > correct for i in $(seq 1 1000); do sh repro.sh > sample && cmp sample correct >/dev/null || \ { echo "fail $i" && cp sample sample.$i; }; done > I'm at a complete loss. > Inserting "sync" between calls as shown below doesn't fix the issue (although it drops from 9-15 to 4-10 fails on Linux; Not a very good improvement and only two test-runs). I have no idea how it can behave so strangely, and I refuse to believe that the ext3 fs driver allows dirty reads. > # Clean up from last run and start over > rm -rf .git has-crlf > git init > git config core.autocrlf false > > # Add a "bad" file > perl -e 'printf( "12%c%c", 0xd, 0xa )' > has-crlf > git add has-crlf > git commit -m"add crlf" > sync > # I realize that switching is ill-advised, but I'm > # trying to track down a possibly related problem... > git config core.autocrlf true > sync > # This sometimes produces output and sometimes it doesn't. > # Either way rerunning just git-diff always gives the same result > # as the first run in this repo. > git diff > -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Schrödinger's diff 2009-07-07 6:53 Schrödinger's diff Eric Raible 2009-07-07 7:28 ` Johannes Sixt 2009-07-07 7:52 ` Andreas Ericsson @ 2009-07-07 17:36 ` Daniel Barkalow 2009-07-07 19:36 ` Jeff King 2 siblings, 1 reply; 13+ messages in thread From: Daniel Barkalow @ 2009-07-07 17:36 UTC (permalink / raw) To: Eric Raible; +Cc: Git Mailing List [-- Attachment #1: Type: TEXT/PLAIN, Size: 2272 bytes --] On Mon, 6 Jul 2009, Eric Raible wrote: > git version 1.6.3.2.1299.gee46c (msysgit) > > In trying to track down some annoying crlf corruption in a repo > I have found a Schrödinger's diff. In other words it's unknown > whether the diff will produce output or not on any particular run > of the following script. > > Sometimes it does, and sometimes it doesn't (seems to be about > 50/50). But either way in any given repo rerunning the git-diff will > always give the same result. > > Doing an "git ls-tree HEAD" gives an identical tree in both cases. > > Can anyone explain why the output to this is not deterministic? > I'm at a complete loss. > > # Clean up from last run and start over > rm -rf .git has-crlf > git init > git config core.autocrlf false > > # Add a "bad" file > perl -e 'printf( "12%c%c", 0xd, 0xa )' > has-crlf > git add has-crlf If has-crlf and .git/index have the same timestamp, git does not know whether the file has been modified afterwards or not. If they have different timestamps, git knows the file hasn't been modified after the add. (More precisely, the index contains the mtime of the file, and it will agree with the file system. However, if the timestamp on the index matches a timestamp *in* the index, that means that, when the index was written, the time period represented by that timestamp was not yet over when git looked at the file. Therefore, the file could have changed again after that time and still gotten the same timestamp it already had. This means that git can't be sure that there's nothing new to see in the filesystem.) > git commit -m"add crlf" > > # I realize that switching is ill-advised, but I'm > # trying to track down a possibly related problem... > git config core.autocrlf true > > # This sometimes produces output and sometimes it doesn't. > # Either way rerunning just git-diff always gives the same result > # as the first run in this repo. > git diff If git knows the file hasn't been modified, it doesn't produce a diff. If it doesn't know the file hasn't been modified, it looks at the actual contents and it find that the result of reading the disk applying autocrlf now doesn't match the contents of the index. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Schrödinger's diff 2009-07-07 17:36 ` Daniel Barkalow @ 2009-07-07 19:36 ` Jeff King 2009-07-07 19:48 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: Jeff King @ 2009-07-07 19:36 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Eric Raible, Git Mailing List On Tue, Jul 07, 2009 at 01:36:08PM -0400, Daniel Barkalow wrote: > > # I realize that switching is ill-advised, but I'm > > # trying to track down a possibly related problem... > > git config core.autocrlf true > > > > # This sometimes produces output and sometimes it doesn't. > > # Either way rerunning just git-diff always gives the same result > > # as the first run in this repo. > > git diff > > If git knows the file hasn't been modified, it doesn't produce a diff. > > If it doesn't know the file hasn't been modified, it looks at the actual > contents and it find that the result of reading the disk applying autocrlf > now doesn't match the contents of the index. Yes, that was my analysis upon reading the original mail, as well (and I have been bitten by this before while testing crlf stuff). The same thing can happen with clean/smudge, I think. When you set up config that changes how we view worktree files (like crlf or clean/smudge) and there is already cached stat information in the index, you really need to invalidate the matching stat information in the index to get sane results[1]. It might be nice for "git config" to do this for you, but: 1. You could just as easily be hand-editing the config. 2. It feels wrong from a modularity standpoint. Right now "git config" doesn't actually care about the semantics of config, just the syntax. Which makes it exactly equivalent to hand-editing. 3. It doesn't cover every situation. Files can also be "changed" in this way by editing .gitattributes, which can be changed manually or by any number of git commands (like checkout, reset, etc). So I think automatically detecting this situation would require flags in the index to say "this stat information is valid only over these particular settings". And you would want it per-file to avoid having to re-hash every file when you change the .gitattributes for one file. The command using the index would check it. But even that might have holes, I'm afraid -- we don't always look at all of the config in every command, though perhaps we do for such core functionality. -Peff [1] Is there an easy way to do this with update-index? I didn't see one, and had to resort to "git read-tree HEAD". ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Schrödinger's diff 2009-07-07 19:36 ` Jeff King @ 2009-07-07 19:48 ` Junio C Hamano 2009-07-07 19:54 ` Jeff King 2009-07-07 20:30 ` Eric Raible 0 siblings, 2 replies; 13+ messages in thread From: Junio C Hamano @ 2009-07-07 19:48 UTC (permalink / raw) To: Jeff King; +Cc: Daniel Barkalow, Eric Raible, Git Mailing List Jeff King <peff@peff.net> writes: > .... But even that might have holes, > I'm afraid -- we don't always look at all of the config in every > command, though perhaps we do for such core functionality. I personally do not think it is worth it. If you change the crlf, clean/smudge, or anything of that sort, just doing a "rm .git/index" followed by "git reset --hard" would restore sanity to your work tree, no? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Schrödinger's diff 2009-07-07 19:48 ` Junio C Hamano @ 2009-07-07 19:54 ` Jeff King 2009-07-07 22:22 ` Junio C Hamano 2009-07-07 20:30 ` Eric Raible 1 sibling, 1 reply; 13+ messages in thread From: Jeff King @ 2009-07-07 19:54 UTC (permalink / raw) To: Junio C Hamano; +Cc: Daniel Barkalow, Eric Raible, Git Mailing List On Tue, Jul 07, 2009 at 12:48:45PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > .... But even that might have holes, > > I'm afraid -- we don't always look at all of the config in every > > command, though perhaps we do for such core functionality. > > I personally do not think it is worth it. If you change the crlf, > clean/smudge, or anything of that sort, just doing a "rm .git/index" > followed by "git reset --hard" would restore sanity to your work tree, no? Yes, that works fine, but: 1. It blows away anything unrelated you might have staged. 2. You have to know to do it (and you get very confusing results if you don't), which makes it very unfriendly for newbies. 3. You have to know to do it, and it isn't documented. :) (3) at least is not too hard to address. And perhaps (2) is not a big enough issue to care about. This is not a problem we have seen on the list a lot. I suspect it is because most CRLF users are on Windows, and therefore have it setup before the tree is checked out, and there are simply not all that many clean/smudge users. -Peff ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Schrödinger's diff 2009-07-07 19:54 ` Jeff King @ 2009-07-07 22:22 ` Junio C Hamano 2009-07-08 0:17 ` Eric Raible 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2009-07-07 22:22 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Daniel Barkalow, Eric Raible, Git Mailing List Jeff King <peff@peff.net> writes: > Yes, that works fine, but: > > 1. It blows away anything unrelated you might have staged. > > 2. You have to know to do it (and you get very confusing results if > you don't), which makes it very unfriendly for newbies. > > 3. You have to know to do it, and it isn't documented. :) > > (3) at least is not too hard to address. And perhaps (2) is not a big > enough issue to care about. This is not a problem we have seen on the > list a lot. I suspect it is because most CRLF users are on Windows, and > therefore have it setup before the tree is checked out, and there are > simply not all that many clean/smudge users. A much more important reason is that it is a one-time event. You notice that you screwed up the configuration to use your peculiar work tree representation, and you fix it once and for all. Because not only it is a one-time event but because it is a big-deal event, I do not think it is something people even would want to think about doing it while having local changes, so I suspect #1 is also a non-issue. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Schrödinger's diff 2009-07-07 22:22 ` Junio C Hamano @ 2009-07-08 0:17 ` Eric Raible 2009-07-08 2:54 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: Eric Raible @ 2009-07-08 0:17 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Daniel Barkalow, Git Mailing List On Tue, Jul 7, 2009 at 3:22 PM, Junio C Hamano<gitster@pobox.com> wrote: > You notice > that you screwed up the configuration to use your peculiar work tree > representation, and you fix it once and for all. A kinda dim-witted question: So what's the best way of "fixing once and for all" a repo infected with carriage returns when you want to use autocrlf=true moving forward? And a hopefully less annoying one: Would you accept a patch explaining how "git reset --hard" doesn't actually rebuild the index from scratch, and how "git read-head" might be recommended in some weird situations? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Schrödinger's diff 2009-07-08 0:17 ` Eric Raible @ 2009-07-08 2:54 ` Junio C Hamano 2009-07-08 6:13 ` Eric Raible 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2009-07-08 2:54 UTC (permalink / raw) To: Eric Raible; +Cc: Jeff King, Daniel Barkalow, Git Mailing List Eric Raible <raible@gmail.com> writes: > So what's the best way of "fixing once and for all" a repo infected with > carriage returns when you want to use autocrlf=true moving forward? Didn't "rm -f .git/index && git reset --hard HEAD" work? > And a hopefully less annoying one: > > Would you accept a patch explaining how "git reset --hard" doesn't > actually rebuild the index from scratch,... Absolutely. > ... and how "git read-head" might > be recommended in some weird situations? I am less certain about that; people may have easier-to-read solution than the one with read-tree, I would suspect. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Schrödinger's diff 2009-07-08 2:54 ` Junio C Hamano @ 2009-07-08 6:13 ` Eric Raible 0 siblings, 0 replies; 13+ messages in thread From: Eric Raible @ 2009-07-08 6:13 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Daniel Barkalow, Git Mailing List On Tue, Jul 7, 2009 at 7:54 PM, Junio C Hamano<gitster@pobox.com> wrote: > Eric Raible <raible@gmail.com> writes: > >> So what's the best way of "fixing once and for all" a repo infected with >> carriage returns when you want to use autocrlf=true moving forward? > > Didn't "rm -f .git/index && git reset --hard HEAD" work? Will, it "worked" in the sense that it usually [1] allows "git diff --name-only" to correctly show the files that were checked in with crlf endings and core.autocrlf false. Those files are then out-of-date when core.autocrlf is true and .git/index is up-to-date. In other words, that first step worked around the fact that sometimes that .git/index was out of date. By "fixing it once and for all" I was trying to refer to creating the correct commit to abolish the carriage returns from my repo. I ended up with this crude-but-obvious loop which generates many LF->CRLF warnings [2]: for i in `git diff --name-only`; do echo $i sed 's/0x0D//' < $i > foo mv foo $i git add $i done >> Would you accept a patch explaining how "git reset --hard" doesn't >> actually rebuild the index from scratch,... > > Absolutely. I'll try to get to it, given the $dayjob / $significant_other constraints. - Eric [1] I've had cases where for whatever reasons a "git read-tree HEAD" seemed to be required, but I don't have the recipe yet. [2] Which can be abolished by wrapping it in autocrlf=false before and autocrlf=true after ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Schrödinger's diff 2009-07-07 19:48 ` Junio C Hamano 2009-07-07 19:54 ` Jeff King @ 2009-07-07 20:30 ` Eric Raible 2009-07-07 20:48 ` Jeff King 1 sibling, 1 reply; 13+ messages in thread From: Eric Raible @ 2009-07-07 20:30 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Daniel Barkalow, Git Mailing List On Tue, Jul 7, 2009 at 12:48 PM, Junio C Hamano<gitster@pobox.com> wrote: > Jeff King <peff@peff.net> writes: > >> .... But even that might have holes, >> I'm afraid -- we don't always look at all of the config in every >> command, though perhaps we do for such core functionality. > > I personally do not think it is worth it. If you change the crlf, > clean/smudge, or anything of that sort, just doing a "rm .git/index" > followed by "git reset --hard" would restore sanity to your work tree, no? Is there any technical reason why "git reset --hard" shouldn't repopulate the index by doing a "git read-tree" or equivalent [1]? After all the docs claim it "Matches the working tree and index to that of the tree being switched to". Except in this case it doesn't. The resulting .index is an invalid representation of the tree. Sanity can be restored with "git read-tree HEAD" (as Jeff suggested), but that's hardly intuitive. - Eric [1] For instance anything at all (possibly involving time stamps) such that the index will truly match the tree. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Schrödinger's diff 2009-07-07 20:30 ` Eric Raible @ 2009-07-07 20:48 ` Jeff King 0 siblings, 0 replies; 13+ messages in thread From: Jeff King @ 2009-07-07 20:48 UTC (permalink / raw) To: Eric Raible; +Cc: Junio C Hamano, Daniel Barkalow, Git Mailing List On Tue, Jul 07, 2009 at 01:30:47PM -0700, Eric Raible wrote: > > I personally do not think it is worth it. If you change the crlf, > > clean/smudge, or anything of that sort, just doing a "rm .git/index" > > followed by "git reset --hard" would restore sanity to your work tree, no? > > Is there any technical reason why "git reset --hard" shouldn't repopulate > the index by doing a "git read-tree" or equivalent [1]? After all the > docs claim it > "Matches the working tree and index to that of the tree being switched to". Yes; relying on the stat cache is what makes "git reset --hard" really fast, instead of having to re-hash each file. The real problem is that we are invalidating the contents of that cache, but not marking the entry as dirty. Right now we don't mark the entries, and you get stale data until you manually mark them as dirty. But marking them dirty on every reset will mean that reset can't make use of the cache, which is slow. So ideally we would have a way of marking them dirty only when necessary. One way would be a "git reset --hard --reset-me-harder", but that obviously still involves manual work; you're just making the command a little easier to find and use. As I suggested before, you could include the changed bits (i.e., the attributes and config) as part of the cache validity information. But I suspect it would be hard to implement, as it involves new fields in the index. It also involves arbitrary user commands, so getting it 100% right would be impossible (e.g., we can record the string "my-clean-filter" as the filter name, but we can't know when that filter's behavior has changed); however, in practice, I think we can assume that a string containing a text command will have stable output. -Peff ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-07-08 6:13 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-07-07 6:53 Schrödinger's diff Eric Raible 2009-07-07 7:28 ` Johannes Sixt 2009-07-07 7:52 ` Andreas Ericsson 2009-07-07 17:36 ` Daniel Barkalow 2009-07-07 19:36 ` Jeff King 2009-07-07 19:48 ` Junio C Hamano 2009-07-07 19:54 ` Jeff King 2009-07-07 22:22 ` Junio C Hamano 2009-07-08 0:17 ` Eric Raible 2009-07-08 2:54 ` Junio C Hamano 2009-07-08 6:13 ` Eric Raible 2009-07-07 20:30 ` Eric Raible 2009-07-07 20:48 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).