git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories
@ 2010-05-10 17:11 Finn Arne Gangstad
  2010-05-10 17:29 ` [msysGit] " Johannes Schindelin
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Finn Arne Gangstad @ 2010-05-10 17:11 UTC (permalink / raw)
  To: git, msysgit; +Cc: Eyvind Bernhardsen, Junio C Hamano, Dmitry Potapov

Previously, autocrlf would only work well for normalized
repositories. Any text files that contained CRLF in the repository
would cause problems, and would be modified when handled with
core.autocrlf set.

Change autocrlf to not do any conversions to files that in the
repository already contain a CR. git with autocrlf set will never
create such a file, or change a LF only file to contain CRs, so the
(new) assumption is that if a file contains a CR, it is intentional,
and autocrlf should not change that.

The following sequence should now always be a NOP even with autocrlf
set (assuming a clean working directory):

git checkout <something>
touch *
git add -A .    (will add nothing)
git comit       (nothing to commit)

Previously this would break for any text file containing a CR

Signed-off-by: Finn Arne Gangstad <finag@pvv.org>
---

Some of you may have been folowing Eyvind's excellent thread about
trying to make end-of-line translation in git a bit smoother.

I decided to attack the problem from a different angle: Is it possible
to make autocrlf behave non-destructively for all the previous problem cases?

Stealing the problem from Eyvind's initial mail (paraphrased and
summarized a bit):

1. Setting autocrlf globally is a pain since autocrlf does not work well
   with CRLF in the repo
2. Setting it in individual repos is hard since you do it "too late"
   (the clone will get it wrong)
3. If someone checks in a file with CRLF later, you get into problems again
4. If a repository once has contained CRLF, you can't tell autocrlf
   at which commit everything is sane again
5. autocrlf does needless work if you know that all your users want
   the same EOL style.

I belive that this patch makes autocrlf a safe (and good) default
setting for Windows, and this solves problems 1-4.

I implemented it by looking for CR charactes in the index, and
aborting any conversion attempt if this is found. The code to read
the index contents was copied pretty verbatim from attr.c, and should
probably be made into a non-static function instead if there is no
better way of doing this.

Note that ALL the tests still pass unmodified. This is a bit
surprising perhaps, but think it is an indication that no one ever
intented autocrlf to do what it does to files containing CRs.


 convert.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/convert.c b/convert.c
index 4f8fcb7..9d062c8 100644
--- a/convert.c
+++ b/convert.c
@@ -120,6 +120,43 @@ static void check_safe_crlf(const char *path, int action,
 	}
 }
 
+static int has_cr_in_index(const char *path)
+{
+	int pos, len;
+	unsigned long sz;
+	enum object_type type;
+	void *data;
+	int has_cr;
+	struct index_state *istate = &the_index;
+
+	len = strlen(path);
+	pos = index_name_pos(istate, path, len);
+	if (pos < 0) {
+		/*
+		 * We might be in the middle of a merge, in which
+		 * case we would read stage #2 (ours).
+		 */
+		int i;
+		for (i = -pos - 1;
+		     (pos < 0 && i < istate->cache_nr &&
+		      !strcmp(istate->cache[i]->name, path));
+		     i++)
+			if (ce_stage(istate->cache[i]) == 2)
+				pos = i;
+	}
+	if (pos < 0)
+		return 0;
+	data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
+	if (!data || type != OBJ_BLOB) {
+		free(data);
+		return 0;
+	}
+
+	has_cr = memchr(data, '\r', sz) != NULL;
+	free(data);
+	return has_cr;
+}
+
 static int crlf_to_git(const char *path, const char *src, size_t len,
                        struct strbuf *buf, int action, enum safe_crlf checksafe)
 {
@@ -147,6 +184,10 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
 			return 0;
 	}
 
+	/* If the file in the index has any CR in it, do not convert. */
+	if (has_cr_in_index(path))
+		return 0;
+
 	check_safe_crlf(path, action, &stats, checksafe);
 
 	/* Optimization: No CR? Nothing to convert, regardless. */
@@ -202,6 +243,10 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
 	if (stats.lf == stats.crlf)
 		return 0;
 
+	/* Are there ANY lines at all with CRLF? If so, ignore */
+	if (stats.crlf > 0)
+		return 0;
+
 	if (action == CRLF_GUESS) {
 		/* If we have any bare CR characters, we're not going to touch it */
 		if (stats.cr != stats.crlf)
-- 
1.7.1.1.g653e8

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-05-11 22:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-10 17:11 [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories Finn Arne Gangstad
2010-05-10 17:29 ` [msysGit] " Johannes Schindelin
2010-05-10 18:48   ` Junio C Hamano
2010-05-11 22:28   ` Finn Arne Gangstad
2010-05-10 19:09 ` Eyvind Bernhardsen
2010-05-10 19:43 ` Dmitry Potapov
2010-05-11 16:31   ` Eyvind Bernhardsen
2010-05-10 20:30 ` Jakub Narebski
2010-05-10 21:17   ` Dmitry Potapov
2010-05-11 22:52   ` Finn Arne Gangstad

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).