From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Mislav Marohnic <mislav@github.com>
Subject: [RFH] eol=lf on existing mixed line-ending files
Date: Thu, 7 Apr 2011 19:15:57 -0400 [thread overview]
Message-ID: <20110407231556.GA10868@sigill.intra.peff.net> (raw)
I investigated some odd git behavior with the EOL gitattributes today,
and I'm curious to hear what others on the list think of what git does.
In particular, index raciness means git produces non-deterministic
results in this case.
The repo in question has a gitattributes file with "* crlf=input" (which
we would spell "eol=lf" these days, but the results are the same), but
still contains some files with mixed line endings. Which you can
reproduce with:
git init repo &&
cd repo &&
{
printf 'one\n' &&
printf 'two\r\n'
} >mixed &&
git add mixed &&
git commit -m one &&
echo '* eol=lf' >.gitattributes
Now if we run "git status" or "git diff", it will let us know that
"mixed" is modified, insofar as adding and committing it would perform
the LF conversion.
Now we come to the first confusing behavior. Generally one would expect
the working directory to be clean after a "git reset --hard". But not
here:
git reset --hard &&
git status
will still show "mixed" as modified. Because of course we are checking
out the version from HEAD into the index and working tree, which has the
mixed line endings. So we rewrite the identical file.
So that kind of makes sense. But it isn't all that helpful, if I just
want to reset my working tree to something sane without making a new
commit (more on this later).
But here's an extra helping of confusion on top. Every once in a while,
doing the reset _won't_ keep "mixed" as modified. I can trigger it
reliably by inserting an extra sleep into git:
diff --git a/unpack-trees.c b/unpack-trees.c
index 500ebcf..735b13e 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -223,6 +223,7 @@ static int check_updates(struct unpack_trees_options *o)
}
}
stop_progress(&progress);
+ sleep(1);
if (o->update)
git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
return errs != 0;
That puts a delay between when reset writes the "mixed" file, and when
we write out the refreshed index. So next time we look at the index
(e.g., in "status"), we will see that the "mixed" entry has up-to-date
stat information and not look at its actual contents.
But in the original case (without the sleep), that doesn't happen.
There, we usually end up writing the file and the index in the same
second. So when status looks at the index, the "mixed" entry is racily
clean, and we actually check it again.
So we get two different outcomes, depending on the index raciness. Which
one is right, or is it right for it to be non-deterministic?
And one final question. Let's say I don't immediately convert this mixed
file to the correct line-endings. Instead, it persists over a large
number of commits, some of them even changing the "mixed" file but not
fixing the line endings[1]. We can simulate that with:
mv .gitattributes tmp
echo three >>mixed &&
git commit -a -m three &&
mv tmp .gitattributes
Now imagine I am somebody who has cloned this repo; the clone will tend
to end the race condition in the "clean" state, since it will often take
more than 1 second to write out all of the files (at least for a
normal-sized project). We can simulate using our sleep-patched reset:
git reset --hard
to get a "clean" repo. Now let's say I want to explore old history, so I
go to a detached HEAD, but using normal git, not the sleep-patched one:
git checkout HEAD^
And, of course, now we think "mixed" is modified. After I'm done
exploring, I want to go back to "master", but I can't:
$ git checkout master
error: Your local changes to the following files would be overwritten by checkout:
mixed
What is the best way out of this situation? You can't use "reset --hard"
to fix the working tree. I guess "git checkout -f" is the best option.
Hopefully my example made sense and was reproducible. The real repo
which triggered this puzzle was jquery. You can try:
git clone git://github.com/jquery/jquery.git &&
cd jquery &&
git checkout 1.4.2 &&
git checkout master
which will fail (but may succeed racily on a slow enough machine).
Obviously they need to fix the mixed line-ending files in their repo.
But that fix would be on HEAD, and "git checkout 1.4.2" will be forever
broken. Is there a way to fix that?
-Peff
[1] The one thing still puzzling me about the jquery repo is how they
managed to make so many commits (including ones to mixed line ending
files) without seeing the dirty working tree state and committing it. Is
there some combination of config that makes this not happen?
next reply other threads:[~2011-04-07 23:16 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-07 23:15 Jeff King [this message]
2011-04-08 9:36 ` [RFH] eol=lf on existing mixed line-ending files Michael J Gruber
2011-04-08 16:06 ` Jeff King
2011-04-09 18:58 ` Dmitry Potapov
2011-04-09 19:32 ` Jeff King
2011-04-09 20:09 ` Dmitry Potapov
2011-04-12 13:57 ` Jay Soffian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110407231556.GA10868@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=mislav@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).