* [PATCH v3 0/5] End-of-line normalization, redesigned
@ 2010-05-12 23:00 Eyvind Bernhardsen
2010-05-12 23:00 ` [PATCH v3 1/5] autocrlf: Make it work also for un-normalized repositories Eyvind Bernhardsen
` (4 more replies)
0 siblings, 5 replies; 27+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-12 23:00 UTC (permalink / raw)
To: git
Cc: msysGit, Linus Torvalds, Junio C Hamano, Dmitry Potapov,
Robert Buck, Finn Arne Gangstad, Jay Soffian
After Finn Arne's bombshell of a patch, I was almost ready to throw in
the towel on this series. Then I realized that just because autocrlf
is safe to use now doesn't mean it solves my CRLF-related problems.
The reason is that since autocrlf doesn't require your text files to
be normalized any more, it also doesn't guarantee that they are. If
you need to interoperate with some other SCM, have tools that require
a specific line ending, or you just like your repository free of CR
characters, autocrlf doesn't do that.
This series does that. There have been some changes since v2:
- Series is now based on Finn Arne's "safe autocrlf" patch (I took the
one from "pu" since Junio seems to have fixed some whitespace
damage).
- Removed core.eolStyle. This gets more explanation below.
- Added "crlf=lf" and "crlf=crlf"; they turn on normalization and
convert line endings to LF or CRLF on checkout, respectively. Yes,
I know.
- RFC patch: As promised, rename "crlf" attribute as "eolconv",
keeping "crlf" as an alias for backwards compatibility. I think
this one might be worth it, but perhaps not as implemented (see the
fix I made for git-cvsserver.perl to understand why).
- RFC patch: Rename "core.autocrlf" as "core.eolconv". This one is
mainly for fun, not so much for inclusion: it might have the same
problems as adding an alias for "crlf" and I'm not too bothered
about the name any more anyway, as I'll explain below.
So if I've removed eolStyle, how does the user say what line endings
to use for a normalized text file in the working directory? Using
"core.autocrlf". There are three reasons why that isn't completely
insane:
1. A user who wants CRLFs in text files probably doesn't want them
just in files that happen to have normalized line endings.
2. You can force CRLF in the working directory now, so if you just
want .vcproj files and the like to have CRLFs, you check in a
.gitattributes containing "*.vcproj crlf=crlf" or add that line to
your .git/info/attributes. No need to use autocrlf at all.
3. With the "safe autocrlf" patch, core.autocrlf is actually safe to
use in a non-normalized repository, so "core.autocrlf=true" is no
longer an insane default.
Given the intended usage for autocrlf it's not even a particularly bad
name any more: "I don't care how you do it, I just want CRLFs in my
text files". Even "autocrlf=input" isn't that bad if you squint a
bit. After a few beers.
Summary: the new "core.autocrlf" is for when you don't want to mess up
an existing repository with unwanted CRLFs, and the new "crlf"
mechanisms are for normalizing text files.
Eyvind Bernhardsen (4):
Add tests for per-repository eol normalization
Add per-repository eol normalization
Rename "crlf" attribute as "eolconv"
Rename "core.autocrlf" config variable as "core.eolconv"
Finn Arne Gangstad (1):
autocrlf: Make it work also for un-normalized repositories
Documentation/config.txt | 26 ++++---
Documentation/gitattributes.txt | 157 ++++++++++++++++++++++++++++++---------
attr.c | 2 +-
cache.h | 9 ++-
config.c | 13 ++-
convert.c | 115 +++++++++++++++++++++++-----
environment.c | 2 +-
git-cvsserver.perl | 8 ++-
t/t0020-crlf.sh | 106 ++++++++++++++++++++++++++
t/t0025-crlf-auto.sh | 134 +++++++++++++++++++++++++++++++++
10 files changed, 497 insertions(+), 75 deletions(-)
create mode 100755 t/t0025-crlf-auto.sh
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v3 1/5] autocrlf: Make it work also for un-normalized repositories
2010-05-12 23:00 [PATCH v3 0/5] End-of-line normalization, redesigned Eyvind Bernhardsen
@ 2010-05-12 23:00 ` Eyvind Bernhardsen
2010-05-12 23:00 ` [PATCH v3 2/5] Add tests for per-repository eol normalization Eyvind Bernhardsen
` (3 subsequent siblings)
4 siblings, 0 replies; 27+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-12 23:00 UTC (permalink / raw)
To: git
Cc: msysGit, Linus Torvalds, Junio C Hamano, Dmitry Potapov,
Robert Buck, Finn Arne Gangstad, Jay Soffian
From: Finn Arne Gangstad <finnag@pvv.org>
Previously, autocrlf would only work well for normalized
repositories. Any text files that contained CRLF in the repository
would cause problems, and would be modified when handled with
core.autocrlf set.
Change autocrlf to not do any conversions to files that in the
repository already contain a CR. git with autocrlf set will never
create such a file, or change a LF only file to contain CRs, so the
(new) assumption is that if a file contains a CR, it is intentional,
and autocrlf should not change that.
The following sequence should now always be a NOP even with autocrlf
set (assuming a clean working directory):
git checkout <something>
touch *
git add -A . (will add nothing)
git commit (nothing to commit)
Previously this would break for any text file containing a CR.
Some of you may have been folowing Eyvind's excellent thread about
trying to make end-of-line translation in git a bit smoother.
I decided to attack the problem from a different angle: Is it possible
to make autocrlf behave non-destructively for all the previous problem cases?
Stealing the problem from Eyvind's initial mail (paraphrased and
summarized a bit):
1. Setting autocrlf globally is a pain since autocrlf does not work well
with CRLF in the repo
2. Setting it in individual repos is hard since you do it "too late"
(the clone will get it wrong)
3. If someone checks in a file with CRLF later, you get into problems again
4. If a repository once has contained CRLF, you can't tell autocrlf
at which commit everything is sane again
5. autocrlf does needless work if you know that all your users want
the same EOL style.
I belive that this patch makes autocrlf a safe (and good) default
setting for Windows, and this solves problems 1-4 (it solves 2 by being
set by default, which is early enough for clone).
I implemented it by looking for CR charactes in the index, and
aborting any conversion attempt if this is found.
Signed-off-by: Finn Arne Gangstad <finag@pvv.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
---
convert.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
t/t0020-crlf.sh | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 101 insertions(+), 0 deletions(-)
diff --git a/convert.c b/convert.c
index 4f8fcb7..46622b0 100644
--- a/convert.c
+++ b/convert.c
@@ -120,6 +120,43 @@ static void check_safe_crlf(const char *path, int action,
}
}
+static int has_cr_in_index(const char *path)
+{
+ int pos, len;
+ unsigned long sz;
+ enum object_type type;
+ void *data;
+ int has_cr;
+ struct index_state *istate = &the_index;
+
+ len = strlen(path);
+ pos = index_name_pos(istate, path, len);
+ if (pos < 0) {
+ /*
+ * We might be in the middle of a merge, in which
+ * case we would read stage #2 (ours).
+ */
+ int i;
+ for (i = -pos - 1;
+ (pos < 0 && i < istate->cache_nr &&
+ !strcmp(istate->cache[i]->name, path));
+ i++)
+ if (ce_stage(istate->cache[i]) == 2)
+ pos = i;
+ }
+ if (pos < 0)
+ return 0;
+ data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
+ if (!data || type != OBJ_BLOB) {
+ free(data);
+ return 0;
+ }
+
+ has_cr = memchr(data, '\r', sz) != NULL;
+ free(data);
+ return has_cr;
+}
+
static int crlf_to_git(const char *path, const char *src, size_t len,
struct strbuf *buf, int action, enum safe_crlf checksafe)
{
@@ -145,6 +182,13 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
*/
if (is_binary(len, &stats))
return 0;
+
+ /*
+ * If the file in the index has any CR in it, do not convert.
+ * This is the new safer autocrlf handling.
+ */
+ if (has_cr_in_index(path))
+ return 0;
}
check_safe_crlf(path, action, &stats, checksafe);
@@ -203,6 +247,11 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
return 0;
if (action == CRLF_GUESS) {
+ /* If we have any CR or CRLF line endings, we do not touch it */
+ /* This is the new safer autocrlf-handling */
+ if (stats.cr > 0 || stats.crlf > 0)
+ return 0;
+
/* If we have any bare CR characters, we're not going to touch it */
if (stats.cr != stats.crlf)
return 0;
diff --git a/t/t0020-crlf.sh b/t/t0020-crlf.sh
index c3e7e32..234a94f 100755
--- a/t/t0020-crlf.sh
+++ b/t/t0020-crlf.sh
@@ -453,5 +453,57 @@ test_expect_success 'invalid .gitattributes (must not crash)' '
git diff
'
+# Some more tests here to add new autocrlf functionality.
+# We want to have a known state here, so start a bit from scratch
+
+test_expect_success 'setting up for new autocrlf tests' '
+ git config core.autocrlf false &&
+ git config core.safecrlf false &&
+ rm -rf .????* * &&
+ for w in I am all LF; do echo $w; done >alllf &&
+ for w in Oh here is CRLFQ in text; do echo $w; done | q_to_cr >mixed &&
+ for w in I am all CRLF; do echo $w; done | append_cr >allcrlf &&
+ git add -A . &&
+ git commit -m "alllf, allcrlf and mixed only" &&
+ git tag -a -m "message" autocrlf-checkpoint
+'
+
+test_expect_success 'report no change after setting autocrlf' '
+ git config core.autocrlf true &&
+ touch * &&
+ git diff --exit-code
+'
+
+test_expect_success 'files are clean after checkout' '
+ rm * &&
+ git checkout -f &&
+ git diff --exit-code
+'
+
+cr_to_Q_no_NL () {
+ tr '\015' Q | tr -d '\012'
+}
+
+test_expect_success 'LF only file gets CRLF with autocrlf' '
+ test "$(cr_to_Q_no_NL < alllf)" = "IQamQallQLFQ"
+'
+
+test_expect_success 'Mixed file is still mixed with autocrlf' '
+ test "$(cr_to_Q_no_NL < mixed)" = "OhhereisCRLFQintext"
+'
+
+test_expect_success 'CRLF only file has CRLF with autocrlf' '
+ test "$(cr_to_Q_no_NL < allcrlf)" = "IQamQallQCRLFQ"
+'
+
+test_expect_success 'New CRLF file gets LF in repo' '
+ tr -d "\015" < alllf | append_cr > alllf2 &&
+ git add alllf2 &&
+ git commit -m "alllf2 added" &&
+ git config core.autocrlf false &&
+ rm * &&
+ git checkout -f &&
+ test_cmp alllf alllf2
+'
test_done
--
1.7.1.3.g448cb.dirty
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 2/5] Add tests for per-repository eol normalization
2010-05-12 23:00 [PATCH v3 0/5] End-of-line normalization, redesigned Eyvind Bernhardsen
2010-05-12 23:00 ` [PATCH v3 1/5] autocrlf: Make it work also for un-normalized repositories Eyvind Bernhardsen
@ 2010-05-12 23:00 ` Eyvind Bernhardsen
2010-05-12 23:00 ` [PATCH v3 3/5] Add " Eyvind Bernhardsen
` (2 subsequent siblings)
4 siblings, 0 replies; 27+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-12 23:00 UTC (permalink / raw)
To: git
Cc: msysGit, Linus Torvalds, Junio C Hamano, Dmitry Potapov,
Robert Buck, Finn Arne Gangstad, Jay Soffian
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
---
t/t0025-crlf-auto.sh | 121 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 121 insertions(+), 0 deletions(-)
create mode 100755 t/t0025-crlf-auto.sh
diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh
new file mode 100755
index 0000000..40048a7
--- /dev/null
+++ b/t/t0025-crlf-auto.sh
@@ -0,0 +1,121 @@
+#!/bin/sh
+
+test_description='CRLF conversion'
+
+. ./test-lib.sh
+
+has_cr() {
+ tr '\015' Q <"$1" | grep Q >/dev/null
+}
+
+test_expect_success setup '
+
+ git config core.autocrlf false &&
+
+ for w in Hello world how are you; do echo $w; done >one &&
+ for w in I am very very fine thank you; do echo ${w}Q; done | q_to_cr >two &&
+ git add . &&
+
+ git commit -m initial &&
+
+ one=`git rev-parse HEAD:one` &&
+ two=`git rev-parse HEAD:two` &&
+
+ for w in Some extra lines here; do echo $w; done >>one &&
+ git diff >patch.file &&
+ patched=`git hash-object --stdin <one` &&
+ git read-tree --reset -u HEAD &&
+
+ echo happy.
+'
+
+test_expect_success 'default settings cause no changes' '
+
+ rm -f .gitattributes tmp one two &&
+ git read-tree --reset -u HEAD &&
+
+ ! has_cr one &&
+ has_cr two &&
+ onediff=`git diff one` &&
+ twodiff=`git diff two` &&
+ test -z "$onediff" -a -z "$twodiff"
+'
+
+test_expect_failure 'crlf=true causes a CRLF file to be normalized' '
+
+ rm -f .gitattributes tmp one two &&
+ echo "two crlf" > .gitattributes &&
+ git read-tree --reset -u HEAD &&
+
+ # Note, "normalized" means that git will normalize it if added
+ has_cr two &&
+ twodiff=`git diff two` &&
+ test -n "$twodiff"
+'
+
+test_expect_failure 'crlf=crlf gives a normalized file CRLFs with autocrlf=false' '
+
+ rm -f .gitattributes tmp one two &&
+ git config core.autocrlf false &&
+ echo "one crlf=crlf" > .gitattributes &&
+ git read-tree --reset -u HEAD &&
+
+ has_cr one &&
+ onediff=`git diff one` &&
+ test -z "$onediff"
+'
+
+test_expect_failure 'crlf=crlf gives a normalized file CRLFs with autocrlf=input' '
+
+ rm -f .gitattributes tmp one two &&
+ git config core.autocrlf input &&
+ echo "one crlf=crlf" > .gitattributes &&
+ git read-tree --reset -u HEAD &&
+
+ has_cr one &&
+ onediff=`git diff one` &&
+ test -z "$onediff"
+'
+
+test_expect_failure 'crlf=lf gives a normalized file LFs with autocrlf=true' '
+
+ rm -f .gitattributes tmp one two &&
+ git config core.autocrlf true &&
+ echo "one crlf=lf" > .gitattributes &&
+ git read-tree --reset -u HEAD &&
+
+ ! has_cr one &&
+ onediff=`git diff one` &&
+ test -z "$onediff"
+'
+
+test_expect_success 'autocrlf=true does not normalize CRLF files' '
+
+ rm -f .gitattributes tmp one two &&
+ git config core.autocrlf true &&
+ git read-tree --reset -u HEAD &&
+
+ has_cr one &&
+ has_cr two &&
+ onediff=`git diff one` &&
+ twodiff=`git diff two` &&
+ test -z "$onediff" -a -z "$twodiff"
+'
+
+test_expect_failure 'crlf=auto, autocrlf=true _does_ normalize CRLF files' '
+
+ rm -f .gitattributes tmp one two &&
+ git config core.autocrlf true &&
+ echo "* crlf=auto" > .gitattributes &&
+ git read-tree --reset -u HEAD &&
+
+ has_cr one &&
+ has_cr two &&
+ onediff=`git diff one` &&
+ twodiff=`git diff two` &&
+ test -z "$onediff" -a -n "$twodiff"
+'
+
+# look through the logic changes and find the corner cases
+
+test_done
--
1.7.1.3.g448cb.dirty
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 3/5] Add per-repository eol normalization
2010-05-12 23:00 [PATCH v3 0/5] End-of-line normalization, redesigned Eyvind Bernhardsen
2010-05-12 23:00 ` [PATCH v3 1/5] autocrlf: Make it work also for un-normalized repositories Eyvind Bernhardsen
2010-05-12 23:00 ` [PATCH v3 2/5] Add tests for per-repository eol normalization Eyvind Bernhardsen
@ 2010-05-12 23:00 ` Eyvind Bernhardsen
2010-05-12 23:00 ` [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv" Eyvind Bernhardsen
2010-05-12 23:00 ` [RFC/PATCH v3 5/5] Rename "core.autocrlf" config variable as "core.eolconv" Eyvind Bernhardsen
4 siblings, 0 replies; 27+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-12 23:00 UTC (permalink / raw)
To: git
Cc: msysGit, Linus Torvalds, Junio C Hamano, Dmitry Potapov,
Robert Buck, Finn Arne Gangstad, Jay Soffian
Change the semantics of the "crlf" attribute so that it enables
end-of-line normalization when it is set, regardless of "core.autocrlf".
Add new settings for "crlf": "auto", which enables end-of-line
conversion but does not override the automatic text file detection, and
"crlf" and "lf", which force normalization of the file and set which
line ending it should have in the working directory.
The effect of this change is that a project can enable end-of-line
normalization for all files. This is similar to the "core.autocrlf"
configuration variable, but since the setting is part of the content, it
is cloned when the project is cloned and can be changed if a previously
un-normalized repository is normalized.
The line ending style to be used for normalized text files in the
working directory is set using "core.autocrlf". When it is set to
"true", CRLFs are used in the working directory; when set to "input" or
"false", LFs are used.
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
---
Documentation/config.txt | 4 +-
Documentation/gitattributes.txt | 142 +++++++++++++++++++++++++++++++--------
cache.h | 9 ++-
config.c | 2 +-
convert.c | 71 ++++++++++++-------
environment.c | 2 +-
t/t0025-crlf-auto.sh | 10 ++--
7 files changed, 176 insertions(+), 64 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 92f851e..4d3c472 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -208,8 +208,8 @@ core.autocrlf::
based on the file's contents. See linkgit:gitattributes[5].
core.safecrlf::
- If true, makes git check if converting `CRLF` as controlled by
- `core.autocrlf` is reversible. Git will verify if a command
+ If true, makes git check if converting `CRLF` is reversible when
+ end-of-line conversion is active. Git will verify if a command
modifies a file in the work tree either directly or indirectly.
For example, committing a file followed by checking out the
same file should yield the original file in the work tree. If
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index d892e64..bb3b446 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -95,50 +95,136 @@ repository upon 'git add' and 'git commit'.
`crlf`
^^^^^^
-This attribute controls the line-ending convention.
+This attribute enables and controls end-of-line normalization. When a
+text file is normalized, its line endings are converted to LF in the
+repository. Text files can have their line endings converted to
+CRLF in the working directory, using the `crlf` attribute for
+individual files or the `core.autocrlf` configuration variable for all
+files.
Set::
- Setting the `crlf` attribute on a path is meant to mark
- the path as a "text" file. 'core.autocrlf' conversion
- takes place without guessing the content type by
- inspection.
+ Setting the `crlf` attribute on a path enables end-of-line
+ normalization and marks the path as a text file. End-of-line
+ conversion takes place without guessing the content type.
Unset::
Unsetting the `crlf` attribute on a path tells git not to
attempt any end-of-line conversion upon checkin or checkout.
-Unspecified::
+Set to string value "auto"::
+
+ When `crlf` is set to "auto", the path is marked for automatic
+ end-of-line normalization. If git decides that the content is
+ text, its line endings are normalized to LF on checkin.
- Unspecified `crlf` attribute tells git to apply the
- `core.autocrlf` conversion when the file content looks
- like text.
+Set to string value "crlf"::
-Set to string value "input"::
+ This is similar to setting the attribute to `true`, but forces
+ git to convert line endings to CRLF when the file is checked
+ out, regardless of `core.autocrlf`.
+
+Set to string value "lf"::
This is similar to setting the attribute to `true`, but
- also forces git to act as if `core.autocrlf` is set to
- `input` for the path.
+ prevents git from converting line endings to CRLF when the
+ file is checked out, regardless of `core.autocrlf`. "input"
+ is an alias for "lf".
-Any other value set to `crlf` attribute is ignored and git acts
-as if the attribute is left unspecified.
+Unspecified::
+ Leaving the `crlf` attribute unspecified tells git to apply
+ end-of-line normalization only if the `core.autocrlf`
+ configuration variable is set, the content appears to be text,
+ and the file is either new or already normalized in the
+ repository.
-The `core.autocrlf` conversion
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Any other value causes git to act as if `crlf` has been left
+unspecified.
+
+
+End-of-line conversion
+^^^^^^^^^^^^^^^^^^^^^^
+
+While git normally leaves file contents alone, it can be configured to
+normalize line endings to LF in the repository and, optionally, to
+convert them to CRLF when files are checked out.
+
+Here is an example that will make git normalize .txt, .vcproj and .sh
+files, ensure that .vcproj files have CRLF and .sh files have LF in
+the working directory, and prevent .jpg files from being normalized
+regardless of their content.
+
+------------------------
+*.txt crlf
+*.vcproj crlf=crlf
+*.sh crlf=lf
+*.jpg -crlf
+------------------------
+
+Other source code management systems normalize all text files in their
+repositories, and there are two ways to enable similar automatic
+normalization in git.
-If the configuration variable `core.autocrlf` is false, no
-conversion is done.
+If you simply want to have CRLF line endings in your working directory
+regardless of the repository you are working in, you can set the
+config variable "core.autocrlf" without changing any attributes.
-When `core.autocrlf` is true, it means that the platform wants
-CRLF line endings for files in the working tree, and you want to
-convert them back to the normal LF line endings when checking
-in to the repository.
+------------------------
+[core]
+ autocrlf = true
+------------------------
+
+This does not force normalization of all text files, but does ensure
+that text files that you introduce to the repository have their line
+endings normalized to LF when they are added, and that files that are
+already normalized in the repository stay normalized. You can also
+set `autocrlf` to "input" to have automatic normalization of new text
+files without conversion to CRLF in the working directory.
+
+If you want to interoperate with a source code management system that
+enforces end-of-line normalization, or you simply want all text files
+in your repository to be normalized, you should instead set the `crlf`
+attribute to "auto" for _all_ files.
+
+------------------------
+* crlf=auto
+------------------------
-When `core.autocrlf` is set to "input", line endings are
-converted to LF upon checkin, but there is no conversion done
-upon checkout.
+This ensures that all files that git considers to be text will have
+normalized (LF) line endings in the repository.
+
+NOTE: When `crlf=auto` normalization is enabled in an existing
+repository, any text files containing CRLFs should be normalized. If
+they are not they will be normalized the next time someone tries to
+change them, causing unfortunate misattribution. From a clean working
+directory:
+
+-------------------------------------------------
+$ echo "* crlf=auto" >>.gitattributes
+ # ...this should be the first line in .gitattributes
+$ rm .git/index # Remove the index to force git to
+$ git reset # re-scan the working directory
+$ git status # Show files that will be normalized
+$ git add -u
+$ git add .gitattributes
+$ git commit -m "Introduce end-of-line normalization"
+-------------------------------------------------
+
+If any files that should not be normalized show up in 'git status',
+unset their `crlf` attribute before running 'git add -u'.
+
+------------------------
+manual.pdf -crlf
+------------------------
+
+Conversely, text files that git does not detect can have normalization
+enabled manually.
+
+------------------------
+weirdchars.txt crlf
+------------------------
If `core.safecrlf` is set to "true" or "warn", git verifies if
the conversion is reversible for the current setting of
diff --git a/cache.h b/cache.h
index 5eb0573..d1f669e 100644
--- a/cache.h
+++ b/cache.h
@@ -547,7 +547,6 @@ extern int core_compression_seen;
extern size_t packed_git_window_size;
extern size_t packed_git_limit;
extern size_t delta_base_cache_limit;
-extern int auto_crlf;
extern int read_replace_refs;
extern int fsync_object_files;
extern int core_preload_index;
@@ -561,6 +560,14 @@ enum safe_crlf {
extern enum safe_crlf safe_crlf;
+enum auto_crlf {
+ AUTO_CRLF_FALSE = 0,
+ AUTO_CRLF_TRUE = 1,
+ AUTO_CRLF_INPUT = -1,
+};
+
+extern enum auto_crlf auto_crlf;
+
enum branch_track {
BRANCH_TRACK_UNSPECIFIED = -1,
BRANCH_TRACK_NEVER = 0,
diff --git a/config.c b/config.c
index 6963fbe..b60a1ff 100644
--- a/config.c
+++ b/config.c
@@ -461,7 +461,7 @@ static int git_default_core_config(const char *var, const char *value)
if (!strcmp(var, "core.autocrlf")) {
if (value && !strcasecmp(value, "input")) {
- auto_crlf = -1;
+ auto_crlf = AUTO_CRLF_INPUT;
return 0;
}
auto_crlf = git_config_bool(var, value);
diff --git a/convert.c b/convert.c
index 46622b0..0eb3d4b 100644
--- a/convert.c
+++ b/convert.c
@@ -8,13 +8,17 @@
* This should use the pathname to decide on whether it wants to do some
* more interesting conversions (automatic gzip/unzip, general format
* conversions etc etc), but by default it just does automatic CRLF<->LF
- * translation when the "auto_crlf" option is set.
+ * translation when the "crlf" attribute or "auto_crlf" option is set.
*/
-#define CRLF_GUESS (-1)
-#define CRLF_BINARY 0
-#define CRLF_TEXT 1
-#define CRLF_INPUT 2
+enum action {
+ CRLF_GUESS = -1,
+ CRLF_BINARY = 0,
+ CRLF_TEXT,
+ CRLF_INPUT,
+ CRLF_CRLF,
+ CRLF_AUTO,
+};
struct text_stat {
/* NUL, CR, LF and CRLF counts */
@@ -89,13 +93,14 @@ static int is_binary(unsigned long size, struct text_stat *stats)
return 0;
}
-static void check_safe_crlf(const char *path, int action,
+static void check_safe_crlf(const char *path, enum action action,
struct text_stat *stats, enum safe_crlf checksafe)
{
if (!checksafe)
return;
- if (action == CRLF_INPUT || auto_crlf <= 0) {
+ if (action == CRLF_INPUT ||
+ (action == CRLF_GUESS && auto_crlf == AUTO_CRLF_INPUT)) {
/*
* CRLFs would not be restored by checkout:
* check if we'd remove CRLFs
@@ -106,7 +111,8 @@ static void check_safe_crlf(const char *path, int action,
else /* i.e. SAFE_CRLF_FAIL */
die("CRLF would be replaced by LF in %s.", path);
}
- } else if (auto_crlf > 0) {
+ } else if (action == CRLF_CRLF ||
+ (action == CRLF_GUESS && auto_crlf == AUTO_CRLF_TRUE)) {
/*
* CRLFs would be added by checkout:
* check if we have "naked" LFs
@@ -157,18 +163,23 @@ static int has_cr_in_index(const char *path)
return has_cr;
}
+static int should_guess_text(enum action action) {
+ return (action == CRLF_GUESS || action == CRLF_AUTO);
+}
+
static int crlf_to_git(const char *path, const char *src, size_t len,
- struct strbuf *buf, int action, enum safe_crlf checksafe)
+ struct strbuf *buf, enum action action, enum safe_crlf checksafe)
{
struct text_stat stats;
char *dst;
- if ((action == CRLF_BINARY) || !auto_crlf || !len)
+ if (action == CRLF_BINARY ||
+ (action == CRLF_GUESS && auto_crlf == AUTO_CRLF_FALSE) || !len)
return 0;
gather_stats(src, len, &stats);
- if (action == CRLF_GUESS) {
+ if (should_guess_text(action)) {
/*
* We're currently not going to even try to convert stuff
* that has bare CR characters. Does anybody do that crazy
@@ -183,12 +194,14 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
if (is_binary(len, &stats))
return 0;
- /*
- * If the file in the index has any CR in it, do not convert.
- * This is the new safer autocrlf handling.
- */
- if (has_cr_in_index(path))
- return 0;
+ if (action == CRLF_GUESS) {
+ /*
+ * If the file in the index has any CR in it, do not convert.
+ * This is the new safer autocrlf handling.
+ */
+ if (has_cr_in_index(path))
+ return 0;
+ }
}
check_safe_crlf(path, action, &stats, checksafe);
@@ -201,7 +214,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
if (strbuf_avail(buf) + buf->len < len)
strbuf_grow(buf, len - buf->len);
dst = buf->buf;
- if (action == CRLF_GUESS) {
+ if (should_guess_text(action)) {
/*
* If we guessed, we already know we rejected a file with
* lone CR, and we can strip a CR without looking at what
@@ -224,13 +237,13 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
}
static int crlf_to_worktree(const char *path, const char *src, size_t len,
- struct strbuf *buf, int action)
+ struct strbuf *buf, enum action action)
{
char *to_free = NULL;
struct text_stat stats;
if ((action == CRLF_BINARY) || (action == CRLF_INPUT) ||
- auto_crlf <= 0)
+ (action != CRLF_CRLF && auto_crlf != AUTO_CRLF_TRUE))
return 0;
if (!len)
@@ -246,11 +259,13 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
if (stats.lf == stats.crlf)
return 0;
- if (action == CRLF_GUESS) {
- /* If we have any CR or CRLF line endings, we do not touch it */
- /* This is the new safer autocrlf-handling */
- if (stats.cr > 0 || stats.crlf > 0)
- return 0;
+ if (should_guess_text(action)) {
+ if (action == CRLF_GUESS) {
+ /* If we have any CR or CRLF line endings, we do not touch it */
+ /* This is the new safer autocrlf-handling */
+ if (stats.cr > 0 || stats.crlf > 0)
+ return 0;
+ }
/* If we have any bare CR characters, we're not going to touch it */
if (stats.cr != stats.crlf)
@@ -591,8 +606,12 @@ static int git_path_check_crlf(const char *path, struct git_attr_check *check)
return CRLF_BINARY;
else if (ATTR_UNSET(value))
;
- else if (!strcmp(value, "input"))
+ else if (!strcmp(value, "input") || !strcmp(value, "lf"))
return CRLF_INPUT;
+ else if (!strcmp(value, "crlf"))
+ return CRLF_CRLF;
+ else if (!strcmp(value, "auto"))
+ return CRLF_AUTO;
return CRLF_GUESS;
}
diff --git a/environment.c b/environment.c
index 876c5e5..db4a5e9 100644
--- a/environment.c
+++ b/environment.c
@@ -38,7 +38,7 @@ const char *pager_program;
int pager_use_color = 1;
const char *editor_program;
const char *excludes_file;
-int auto_crlf = 0; /* 1: both ways, -1: only when adding git objects */
+enum auto_crlf auto_crlf = AUTO_CRLF_FALSE;
int read_replace_refs = 1;
enum safe_crlf safe_crlf = SAFE_CRLF_WARN;
unsigned whitespace_rule_cfg = WS_DEFAULT_RULE;
diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh
index 40048a7..f11fee4 100755
--- a/t/t0025-crlf-auto.sh
+++ b/t/t0025-crlf-auto.sh
@@ -41,7 +41,7 @@ test_expect_success 'default settings cause no changes' '
test -z "$onediff" -a -z "$twodiff"
'
-test_expect_failure 'crlf=true causes a CRLF file to be normalized' '
+test_expect_success 'crlf=true causes a CRLF file to be normalized' '
rm -f .gitattributes tmp one two &&
echo "two crlf" > .gitattributes &&
@@ -53,7 +53,7 @@ test_expect_failure 'crlf=true causes a CRLF file to be normalized' '
test -n "$twodiff"
'
-test_expect_failure 'crlf=crlf gives a normalized file CRLFs with autocrlf=false' '
+test_expect_success 'crlf=crlf gives a normalized file CRLFs with autocrlf=false' '
rm -f .gitattributes tmp one two &&
git config core.autocrlf false &&
@@ -65,7 +65,7 @@ test_expect_failure 'crlf=crlf gives a normalized file CRLFs with autocrlf=false
test -z "$onediff"
'
-test_expect_failure 'crlf=crlf gives a normalized file CRLFs with autocrlf=input' '
+test_expect_success 'crlf=crlf gives a normalized file CRLFs with autocrlf=input' '
rm -f .gitattributes tmp one two &&
git config core.autocrlf input &&
@@ -77,7 +77,7 @@ test_expect_failure 'crlf=crlf gives a normalized file CRLFs with autocrlf=input
test -z "$onediff"
'
-test_expect_failure 'crlf=lf gives a normalized file LFs with autocrlf=true' '
+test_expect_success 'crlf=lf gives a normalized file LFs with autocrlf=true' '
rm -f .gitattributes tmp one two &&
git config core.autocrlf true &&
@@ -102,7 +102,7 @@ test_expect_success 'autocrlf=true does not normalize CRLF files' '
test -z "$onediff" -a -z "$twodiff"
'
-test_expect_failure 'crlf=auto, autocrlf=true _does_ normalize CRLF files' '
+test_expect_success 'crlf=auto, autocrlf=true _does_ normalize CRLF files' '
rm -f .gitattributes tmp one two &&
git config core.autocrlf true &&
--
1.7.1.3.g448cb.dirty
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"
2010-05-12 23:00 [PATCH v3 0/5] End-of-line normalization, redesigned Eyvind Bernhardsen
` (2 preceding siblings ...)
2010-05-12 23:00 ` [PATCH v3 3/5] Add " Eyvind Bernhardsen
@ 2010-05-12 23:00 ` Eyvind Bernhardsen
2010-05-13 1:38 ` Linus Torvalds
2010-05-12 23:00 ` [RFC/PATCH v3 5/5] Rename "core.autocrlf" config variable as "core.eolconv" Eyvind Bernhardsen
4 siblings, 1 reply; 27+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-12 23:00 UTC (permalink / raw)
To: git
Cc: msysGit, Linus Torvalds, Junio C Hamano, Dmitry Potapov,
Robert Buck, Finn Arne Gangstad, Jay Soffian
As discussed at length on the list, "crlf" is a pretty bad name for an
attribute that enables end-of-line conversion, and the addition of "lf"
and "crlf" values for it doesn't help.
Rename the attribute "eolconv", but fall back to "crlf" for backwards
compatibility if "eolconv" is not set.
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
---
Documentation/gitattributes.txt | 51 ++++++++++++++++++++------------------
attr.c | 2 +-
convert.c | 15 ++++++++---
git-cvsserver.perl | 8 ++++-
t/t0025-crlf-auto.sh | 31 ++++++++++++++++-------
5 files changed, 67 insertions(+), 40 deletions(-)
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index bb3b446..2887f85 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -92,30 +92,33 @@ such as 'git checkout' and 'git merge' run. They also affect how
git stores the contents you prepare in the working tree in the
repository upon 'git add' and 'git commit'.
-`crlf`
-^^^^^^
+`eolconv`
+^^^^^^^^^
This attribute enables and controls end-of-line normalization. When a
text file is normalized, its line endings are converted to LF in the
repository. Text files can have their line endings converted to
-CRLF in the working directory, using the `crlf` attribute for
+CRLF in the working directory, using the `eolconv` attribute for
individual files or the `core.autocrlf` configuration variable for all
files.
+For compatibility with older versions of git, `crlf` is an alias for
+this attribute.
+
Set::
- Setting the `crlf` attribute on a path enables end-of-line
+ Setting the `eolconv` attribute on a path enables end-of-line
normalization and marks the path as a text file. End-of-line
conversion takes place without guessing the content type.
Unset::
- Unsetting the `crlf` attribute on a path tells git not to
+ Unsetting the `eolconv` attribute on a path tells git not to
attempt any end-of-line conversion upon checkin or checkout.
Set to string value "auto"::
- When `crlf` is set to "auto", the path is marked for automatic
+ When `eolconv` is set to "auto", the path is marked for automatic
end-of-line normalization. If git decides that the content is
text, its line endings are normalized to LF on checkin.
@@ -134,13 +137,13 @@ Set to string value "lf"::
Unspecified::
- Leaving the `crlf` attribute unspecified tells git to apply
+ Leaving the `eolconv` attribute unspecified tells git to apply
end-of-line normalization only if the `core.autocrlf`
configuration variable is set, the content appears to be text,
and the file is either new or already normalized in the
repository.
-Any other value causes git to act as if `crlf` has been left
+Any other value causes git to act as if `eolconv` has been left
unspecified.
@@ -157,10 +160,10 @@ the working directory, and prevent .jpg files from being normalized
regardless of their content.
------------------------
-*.txt crlf
-*.vcproj crlf=crlf
-*.sh crlf=lf
-*.jpg -crlf
+*.txt eolconv
+*.vcproj eolconv=crlf
+*.sh eolconv=lf
+*.jpg -eolconv
------------------------
Other source code management systems normalize all text files in their
@@ -185,24 +188,24 @@ files without conversion to CRLF in the working directory.
If you want to interoperate with a source code management system that
enforces end-of-line normalization, or you simply want all text files
-in your repository to be normalized, you should instead set the `crlf`
+in your repository to be normalized, you should instead set the `eolconv`
attribute to "auto" for _all_ files.
------------------------
-* crlf=auto
+* eolconv=auto
------------------------
This ensures that all files that git considers to be text will have
normalized (LF) line endings in the repository.
-NOTE: When `crlf=auto` normalization is enabled in an existing
+NOTE: When `eolconv=auto` normalization is enabled in an existing
repository, any text files containing CRLFs should be normalized. If
they are not they will be normalized the next time someone tries to
change them, causing unfortunate misattribution. From a clean working
directory:
-------------------------------------------------
-$ echo "* crlf=auto" >>.gitattributes
+$ echo "* eolconv=auto" >>.gitattributes
# ...this should be the first line in .gitattributes
$ rm .git/index # Remove the index to force git to
$ git reset # re-scan the working directory
@@ -213,17 +216,17 @@ $ git commit -m "Introduce end-of-line normalization"
-------------------------------------------------
If any files that should not be normalized show up in 'git status',
-unset their `crlf` attribute before running 'git add -u'.
+unset their `eolconv` attribute before running 'git add -u'.
------------------------
-manual.pdf -crlf
+manual.pdf -eolconv
------------------------
Conversely, text files that git does not detect can have normalization
enabled manually.
------------------------
-weirdchars.txt crlf
+weirdchars.txt eolconv
------------------------
If `core.safecrlf` is set to "true" or "warn", git verifies if
@@ -309,11 +312,11 @@ Interaction between checkin/checkout attributes
In the check-in codepath, the worktree file is first converted
with `filter` driver (if specified and corresponding driver
defined), then the result is processed with `ident` (if
-specified), and then finally with `crlf` (again, if specified
+specified), and then finally with `eolconv` (again, if specified
and applicable).
In the check-out codepath, the blob content is first converted
-with `crlf`, and then `ident` and fed to `filter`.
+with `eolconv`, and then `ident` and fed to `filter`.
Generating diff text
@@ -717,7 +720,7 @@ You do not want any end-of-line conversions applied to, nor textual diffs
produced for, any binary file you track. You would need to specify e.g.
------------
-*.jpg -crlf -diff
+*.jpg -eolconv -diff
------------
but that may become cumbersome, when you have many attributes. Using
@@ -730,7 +733,7 @@ the same time. The system knows a built-in attribute macro, `binary`:
which is equivalent to the above. Note that the attribute macros can only
be "Set" (see the above example that sets "binary" macro as if it were an
-ordinary attribute --- setting it in turn unsets "crlf" and "diff").
+ordinary attribute --- setting it in turn unsets "eolconv" and "diff").
DEFINING ATTRIBUTE MACROS
@@ -741,7 +744,7 @@ at the toplevel (i.e. not in any subdirectory). The built-in attribute
macro "binary" is equivalent to:
------------
-[attr]binary -diff -crlf
+[attr]binary -diff -eolconv
------------
diff --git a/attr.c b/attr.c
index f5346ed..7f924bc 100644
--- a/attr.c
+++ b/attr.c
@@ -287,7 +287,7 @@ static void free_attr_elem(struct attr_stack *e)
}
static const char *builtin_attr[] = {
- "[attr]binary -diff -crlf",
+ "[attr]binary -diff -eolconv",
NULL,
};
diff --git a/convert.c b/convert.c
index 0eb3d4b..b46f85d 100644
--- a/convert.c
+++ b/convert.c
@@ -438,11 +438,13 @@ static int read_convert_config(const char *var, const char *value, void *cb)
static void setup_convert_check(struct git_attr_check *check)
{
+ static struct git_attr *attr_eolconv;
static struct git_attr *attr_crlf;
static struct git_attr *attr_ident;
static struct git_attr *attr_filter;
if (!attr_crlf) {
+ attr_eolconv = git_attr("eolconv");
attr_crlf = git_attr("crlf");
attr_ident = git_attr("ident");
attr_filter = git_attr("filter");
@@ -452,6 +454,7 @@ static void setup_convert_check(struct git_attr_check *check)
check[0].attr = attr_crlf;
check[1].attr = attr_ident;
check[2].attr = attr_filter;
+ check[3].attr = attr_eolconv;
}
static int count_ident(const char *cp, unsigned long size)
@@ -639,7 +642,7 @@ static int git_path_check_ident(const char *path, struct git_attr_check *check)
int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe)
{
- struct git_attr_check check[3];
+ struct git_attr_check check[4];
int crlf = CRLF_GUESS;
int ident = 0, ret = 0;
const char *filter = NULL;
@@ -647,7 +650,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
setup_convert_check(check);
if (!git_checkattr(path, ARRAY_SIZE(check), check)) {
struct convert_driver *drv;
- crlf = git_path_check_crlf(path, check + 0);
+ crlf = git_path_check_crlf(path, check + 3);
+ if (crlf == CRLF_GUESS)
+ crlf = git_path_check_crlf(path, check + 0);
ident = git_path_check_ident(path, check + 1);
drv = git_path_check_convert(path, check + 2);
if (drv && drv->clean)
@@ -669,7 +674,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst)
{
- struct git_attr_check check[3];
+ struct git_attr_check check[4];
int crlf = CRLF_GUESS;
int ident = 0, ret = 0;
const char *filter = NULL;
@@ -677,7 +682,9 @@ int convert_to_working_tree(const char *path, const char *src, size_t len, struc
setup_convert_check(check);
if (!git_checkattr(path, ARRAY_SIZE(check), check)) {
struct convert_driver *drv;
- crlf = git_path_check_crlf(path, check + 0);
+ crlf = git_path_check_crlf(path, check + 3);
+ if (crlf == CRLF_GUESS)
+ crlf = git_path_check_crlf(path, check + 0);
ident = git_path_check_ident(path, check + 1);
drv = git_path_check_convert(path, check + 2);
if (drv && drv->smudge)
diff --git a/git-cvsserver.perl b/git-cvsserver.perl
index 13751db..ede47a6 100755
--- a/git-cvsserver.perl
+++ b/git-cvsserver.perl
@@ -2369,8 +2369,12 @@ sub kopts_from_path
if ( defined ( $cfg->{gitcvs}{usecrlfattr} ) and
$cfg->{gitcvs}{usecrlfattr} =~ /\s*(1|true|yes)\s*$/i )
{
- my ($val) = check_attr( "crlf", $path );
- if ( $val eq "set" )
+ my ($val) = check_attr( "eolconv", $path );
+ if ( $val eq "unspecified" )
+ {
+ $val = check_attr( "crlf", $path );
+ }
+ if ( $val =~ /^(set|crlf|lf)$/ )
{
return "";
}
diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh
index f11fee4..05e5725 100755
--- a/t/t0025-crlf-auto.sh
+++ b/t/t0025-crlf-auto.sh
@@ -41,9 +41,22 @@ test_expect_success 'default settings cause no changes' '
test -z "$onediff" -a -z "$twodiff"
'
-test_expect_success 'crlf=true causes a CRLF file to be normalized' '
+test_expect_success 'eolconv=true causes a CRLF file to be normalized' '
rm -f .gitattributes tmp one two &&
+ echo "two eolconv" > .gitattributes &&
+ git read-tree --reset -u HEAD &&
+
+ # Note, "normalized" means that git will normalize it if added
+ has_cr two &&
+ twodiff=`git diff two` &&
+ test -n "$twodiff"
+'
+
+test_expect_success 'crlf=true also causes a CRLF file to be normalized' '
+
+ # Backwards compatilibity check
+ rm -f .gitattributes tmp one two &&
echo "two crlf" > .gitattributes &&
git read-tree --reset -u HEAD &&
@@ -53,11 +66,11 @@ test_expect_success 'crlf=true causes a CRLF file to be normalized' '
test -n "$twodiff"
'
-test_expect_success 'crlf=crlf gives a normalized file CRLFs with autocrlf=false' '
+test_expect_success 'eolconv=crlf gives a normalized file CRLFs with autocrlf=false' '
rm -f .gitattributes tmp one two &&
git config core.autocrlf false &&
- echo "one crlf=crlf" > .gitattributes &&
+ echo "one eolconv=crlf" > .gitattributes &&
git read-tree --reset -u HEAD &&
has_cr one &&
@@ -65,11 +78,11 @@ test_expect_success 'crlf=crlf gives a normalized file CRLFs with autocrlf=false
test -z "$onediff"
'
-test_expect_success 'crlf=crlf gives a normalized file CRLFs with autocrlf=input' '
+test_expect_success 'eolconv=crlf gives a normalized file CRLFs with autocrlf=input' '
rm -f .gitattributes tmp one two &&
git config core.autocrlf input &&
- echo "one crlf=crlf" > .gitattributes &&
+ echo "one eolconv=crlf" > .gitattributes &&
git read-tree --reset -u HEAD &&
has_cr one &&
@@ -77,11 +90,11 @@ test_expect_success 'crlf=crlf gives a normalized file CRLFs with autocrlf=input
test -z "$onediff"
'
-test_expect_success 'crlf=lf gives a normalized file LFs with autocrlf=true' '
+test_expect_success 'eolconv=lf gives a normalized file LFs with autocrlf=true' '
rm -f .gitattributes tmp one two &&
git config core.autocrlf true &&
- echo "one crlf=lf" > .gitattributes &&
+ echo "one eolconv=lf" > .gitattributes &&
git read-tree --reset -u HEAD &&
! has_cr one &&
@@ -102,11 +115,11 @@ test_expect_success 'autocrlf=true does not normalize CRLF files' '
test -z "$onediff" -a -z "$twodiff"
'
-test_expect_success 'crlf=auto, autocrlf=true _does_ normalize CRLF files' '
+test_expect_success 'eolconv=auto, autocrlf=true _does_ normalize CRLF files' '
rm -f .gitattributes tmp one two &&
git config core.autocrlf true &&
- echo "* crlf=auto" > .gitattributes &&
+ echo "* eolconv=auto" > .gitattributes &&
git read-tree --reset -u HEAD &&
has_cr one &&
--
1.7.1.3.g448cb.dirty
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC/PATCH v3 5/5] Rename "core.autocrlf" config variable as "core.eolconv"
2010-05-12 23:00 [PATCH v3 0/5] End-of-line normalization, redesigned Eyvind Bernhardsen
` (3 preceding siblings ...)
2010-05-12 23:00 ` [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv" Eyvind Bernhardsen
@ 2010-05-12 23:00 ` Eyvind Bernhardsen
4 siblings, 0 replies; 27+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-12 23:00 UTC (permalink / raw)
To: git
Cc: msysGit, Linus Torvalds, Junio C Hamano, Dmitry Potapov,
Robert Buck, Finn Arne Gangstad, Jay Soffian
As asserted by myself and not vigourously contested on the list,
"autocrlf" is a pretty bad name. Rename the variable "core.eolconv",
but also accept "core.autocrlf" for backwards compatibility.
Also add aliases "crlf" for "true" and "lf" for "input".
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
---
Documentation/config.txt | 22 ++++++++-------
Documentation/gitattributes.txt | 16 ++++++------
config.c | 11 ++++++--
t/t0020-crlf.sh | 54 +++++++++++++++++++++++++++++++++++++++
4 files changed, 82 insertions(+), 21 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 4d3c472..6814e23 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -196,16 +196,18 @@ core.quotepath::
quoted without `-z` regardless of the setting of this
variable.
-core.autocrlf::
- If true, makes git convert `CRLF` at the end of lines in text files to
+core.eolconv::
+ If true or 'crlf', makes git convert `CRLF` at the end of lines in text files to
`LF` when reading from the work tree, and convert in reverse when
writing to the work tree. The variable can be set to
- 'input', in which case the conversion happens only while
+ 'input' or 'lf', in which case the conversion happens only while
reading from the work tree but files are written out to the work
tree with `LF` at the end of lines. A file is considered
- "text" (i.e. be subjected to the autocrlf mechanism) based on
+ "text" (i.e. subject to the eolconv mechanism) based on
the file's `crlf` attribute, or if `crlf` is unspecified,
based on the file's contents. See linkgit:gitattributes[5].
+ For backwards compatibility, `core.autocrlf` is an alias of
+ this variable.
core.safecrlf::
If true, makes git check if converting `CRLF` is reversible when
@@ -214,12 +216,12 @@ core.safecrlf::
For example, committing a file followed by checking out the
same file should yield the original file in the work tree. If
this is not the case for the current setting of
- `core.autocrlf`, git will reject the file. The variable can
+ `core.eolconv`, git will reject the file. The variable can
be set to "warn", in which case git will only warn about an
irreversible conversion but continue the operation.
+
CRLF conversion bears a slight chance of corrupting data.
-autocrlf=true will convert CRLF to LF during commit and LF to
+eolconv=true will convert CRLF to LF during commit and LF to
CRLF during checkout. A file that contains a mixture of LF and
CRLF before the commit cannot be recreated by git. For text
files this is the right thing to do: it corrects line endings
@@ -243,9 +245,9 @@ converting CRLFs corrupts data.
+
Note, this safety check does not mean that a checkout will generate a
file identical to the original file for a different setting of
-`core.autocrlf`, but only for the current one. For example, a text
-file with `LF` would be accepted with `core.autocrlf=input` and could
-later be checked out with `core.autocrlf=true`, in which case the
+`core.eolconv`, but only for the current one. For example, a text
+file with `LF` would be accepted with `core.eolconv=input` and could
+later be checked out with `core.eolconv=true`, in which case the
resulting file would contain `CRLF`, although the original file
contained `LF`. However, in both work trees the line endings would be
consistent, that is either all `LF` or all `CRLF`, but never mixed. A
@@ -991,7 +993,7 @@ gitcvs.allbinary::
as binary files, which suppresses any newline munging it
otherwise might do. Alternatively, if it is set to "guess",
then the contents of the file are examined to decide if
- it is binary, similar to 'core.autocrlf'.
+ it is binary, similar to 'core.eolconv'.
gitcvs.dbname::
Database used by git-cvsserver to cache revision information
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 2887f85..7d02146 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -99,7 +99,7 @@ This attribute enables and controls end-of-line normalization. When a
text file is normalized, its line endings are converted to LF in the
repository. Text files can have their line endings converted to
CRLF in the working directory, using the `eolconv` attribute for
-individual files or the `core.autocrlf` configuration variable for all
+individual files or the `core.eolconv` configuration variable for all
files.
For compatibility with older versions of git, `crlf` is an alias for
@@ -126,19 +126,19 @@ Set to string value "crlf"::
This is similar to setting the attribute to `true`, but forces
git to convert line endings to CRLF when the file is checked
- out, regardless of `core.autocrlf`.
+ out, regardless of `core.eolconv`.
Set to string value "lf"::
This is similar to setting the attribute to `true`, but
prevents git from converting line endings to CRLF when the
- file is checked out, regardless of `core.autocrlf`. "input"
+ file is checked out, regardless of `core.eolconv`. "input"
is an alias for "lf".
Unspecified::
Leaving the `eolconv` attribute unspecified tells git to apply
- end-of-line normalization only if the `core.autocrlf`
+ end-of-line normalization only if the `core.eolconv`
configuration variable is set, the content appears to be text,
and the file is either new or already normalized in the
repository.
@@ -172,18 +172,18 @@ normalization in git.
If you simply want to have CRLF line endings in your working directory
regardless of the repository you are working in, you can set the
-config variable "core.autocrlf" without changing any attributes.
+config variable "core.eolconv" without changing any attributes.
------------------------
[core]
- autocrlf = true
+ eolconv = true
------------------------
This does not force normalization of all text files, but does ensure
that text files that you introduce to the repository have their line
endings normalized to LF when they are added, and that files that are
already normalized in the repository stay normalized. You can also
-set `autocrlf` to "input" to have automatic normalization of new text
+set `eolconv` to "input" to have automatic normalization of new text
files without conversion to CRLF in the working directory.
If you want to interoperate with a source code management system that
@@ -231,7 +231,7 @@ weirdchars.txt eolconv
If `core.safecrlf` is set to "true" or "warn", git verifies if
the conversion is reversible for the current setting of
-`core.autocrlf`. For "true", git rejects irreversible
+`core.eolconv`. For "true", git rejects irreversible
conversions; for "warn", git only prints a warning but accepts
an irreversible conversion. The safety triggers to prevent such
a conversion done to the files in the work tree, but there are a
diff --git a/config.c b/config.c
index b60a1ff..a5f445e 100644
--- a/config.c
+++ b/config.c
@@ -459,12 +459,17 @@ static int git_default_core_config(const char *var, const char *value)
return 0;
}
- if (!strcmp(var, "core.autocrlf")) {
- if (value && !strcasecmp(value, "input")) {
+ if (!strcmp(var, "core.eolconv") || !strcmp(var, "core.autocrlf")) {
+ if (value && (!strcasecmp(value, "input") ||
+ !strcasecmp(value, "lf"))) {
auto_crlf = AUTO_CRLF_INPUT;
return 0;
}
- auto_crlf = git_config_bool(var, value);
+ if (value && !strcasecmp(value, "crlf") ||
+ git_config_bool(var, value))
+ auto_crlf = AUTO_CRLF_TRUE;
+ else
+ auto_crlf = AUTO_CRLF_FALSE;
return 0;
}
diff --git a/t/t0020-crlf.sh b/t/t0020-crlf.sh
index 234a94f..52c2b71 100755
--- a/t/t0020-crlf.sh
+++ b/t/t0020-crlf.sh
@@ -135,9 +135,35 @@ test_expect_success 'update with autocrlf=true' '
'
+test_expect_success 'checkout with eolconv=crlf' '
+
+ rm -f tmp one dir/two three &&
+ git config --unset-all core.autocrlf &&
+ git config core.eolconv crlf &&
+ git read-tree --reset -u HEAD &&
+
+ for f in one dir/two
+ do
+ remove_cr <"$f" >tmp && mv -f tmp $f &&
+ git update-index -- $f || {
+ echo "Eh? $f"
+ false
+ break
+ }
+ done &&
+ test "$one" = `git hash-object --stdin <one` &&
+ test "$two" = `git hash-object --stdin <dir/two` &&
+ differs=`git diff-index --cached HEAD` &&
+ test -z "$differs" || {
+ echo Oops "$differs"
+ false
+ }
+'
+
test_expect_success 'checkout with autocrlf=true' '
rm -f tmp one dir/two three &&
+ git config --unset-all core.eolconv &&
git config core.autocrlf true &&
git read-tree --reset -u HEAD &&
@@ -159,9 +185,37 @@ test_expect_success 'checkout with autocrlf=true' '
}
'
+test_expect_success 'checkout with eolconv=lf' '
+
+ rm -f tmp one dir/two three &&
+ git config --unset-all core.autocrlf &&
+ git config core.eolconv lf &&
+ git read-tree --reset -u HEAD &&
+
+ for f in one dir/two
+ do
+ if has_cr "$f"
+ then
+ echo "Eh? $f"
+ false
+ break
+ else
+ git update-index -- $f
+ fi
+ done &&
+ test "$one" = `git hash-object --stdin <one` &&
+ test "$two" = `git hash-object --stdin <dir/two` &&
+ differs=`git diff-index --cached HEAD` &&
+ test -z "$differs" || {
+ echo Oops "$differs"
+ false
+ }
+'
+
test_expect_success 'checkout with autocrlf=input' '
rm -f tmp one dir/two three &&
+ git config --unset-all core.eolconv &&
git config core.autocrlf input &&
git read-tree --reset -u HEAD &&
--
1.7.1.3.g448cb.dirty
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"
2010-05-12 23:00 ` [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv" Eyvind Bernhardsen
@ 2010-05-13 1:38 ` Linus Torvalds
2010-05-13 9:39 ` Robert Buck
2010-05-13 10:59 ` [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv" Eyvind Bernhardsen
0 siblings, 2 replies; 27+ messages in thread
From: Linus Torvalds @ 2010-05-13 1:38 UTC (permalink / raw)
To: Eyvind Bernhardsen
Cc: git, msysGit, Junio C Hamano, Dmitry Potapov, Robert Buck,
Finn Arne Gangstad, Jay Soffian
On Thu, 13 May 2010, Eyvind Bernhardsen wrote:
>
> ------------------------
> -*.txt crlf
> -*.vcproj crlf=crlf
> -*.sh crlf=lf
> -*.jpg -crlf
> +*.txt eolconv
> +*.vcproj eolconv=crlf
> +*.sh eolconv=lf
> +*.jpg -eolconv
> ------------------------
...
> ------------------------
> -* crlf=auto
> +* eolconv=auto
> ------------------------
If you are doing the renaming, then I seriously object to this.
It makes no sense to say "eolconv=crlf" and then say "eolconv=auto". They
are two totally different things. One is _how_ line endings should look
like, and the other is _whether_ line endings exist or not.
And "eolconv=crlf" makes no sense anyway. I assume "conv" is
conversion, but a conversion implies a from and a to. That's just a
"to", and it would make much more sense to just say "eol=crlf" for that
case.
Now, it _does_ make sense to say "eolconv=auto", but that's because it's
that totally different case: it's not about what the line ending
character is, it's about whether any eol conversion is done at all. So
for _that_ case, it makes sense to use "eolconv", although even for that
case I think the name is not very _good_.
So if you rename these things, keep them separate. Make the "am I a
text-file" boolean be a boolean (plus "auto"), and just call it "text".
And make the "what end of line to use" be just "eol" then.
So you can have
* text=auto,eol=crlf
that means "autodetect whether it is text, and use crlf as eol".
Now, I'd further suggest:
- "eol=xyz" with no "text" attribute automatically implies "text" being
true.
- "text=xyz" with no "eol" attribute implies "eol=native"
so now you can write:
*.jpg -text
*.txt text
*.vcproj eol=crlf
*.sh eol=lf
* text=auto
and that means:
- jpg files are binary
- *.txt files are text, and we use the default ("native") line ending for
them (implicit, since we don't have any matcing eol rule)
- *.vcproj files are text (implicit), and we use CRLF line endings
- *.sh files are text (implicit), and we use UNIX style line endings
- everything else is auto-detected, and we implicitly use native line
endings for them
Doesn't that look finally sane?
Because if we really renaem the attributes, let's rename them _right_.
Linus
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"
2010-05-13 1:38 ` Linus Torvalds
@ 2010-05-13 9:39 ` Robert Buck
2010-05-13 9:58 ` Robert Buck
2010-05-13 10:59 ` [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv" Eyvind Bernhardsen
1 sibling, 1 reply; 27+ messages in thread
From: Robert Buck @ 2010-05-13 9:39 UTC (permalink / raw)
To: Linus Torvalds
Cc: Eyvind Bernhardsen, git, msysGit, Junio C Hamano, Dmitry Potapov,
Finn Arne Gangstad, Jay Soffian
[...]
> Now, it _does_ make sense to say "eolconv=auto", but that's because it's
> that totally different case: it's not about what the line ending
> character is, it's about whether any eol conversion is done at all. So
> for _that_ case, it makes sense to use "eolconv", although even for that
> case I think the name is not very _good_.
> So if you rename these things, keep them separate. Make the "am I a
> text-file" boolean be a boolean (plus "auto"), and just call it "text".
> And make the "what end of line to use" be just "eol" then.
>
> So you can have
>
> * text=auto,eol=crlf
>
> that means "autodetect whether it is text, and use crlf as eol".
>
> Now, I'd further suggest:
>
> - "eol=xyz" with no "text" attribute automatically implies "text" being
> true.
> - "text=xyz" with no "eol" attribute implies "eol=native"
>
> so now you can write:
>
> *.jpg -text
> *.txt text
> *.vcproj eol=crlf
> *.sh eol=lf
> * text=auto
>
> and that means:
>
> - jpg files are binary
> - *.txt files are text, and we use the default ("native") line ending for
> them (implicit, since we don't have any matcing eol rule)
> - *.vcproj files are text (implicit), and we use CRLF line endings
> - *.sh files are text (implicit), and we use UNIX style line endings
> - everything else is auto-detected, and we implicitly use native line
> endings for them
>
> Doesn't that look finally sane?
>
> Because if we really rename the attributes, let's rename them _right_.
>
> Linus
>
Love it!
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"
2010-05-13 9:39 ` Robert Buck
@ 2010-05-13 9:58 ` Robert Buck
2010-05-13 11:47 ` Eyvind Bernhardsen
0 siblings, 1 reply; 27+ messages in thread
From: Robert Buck @ 2010-05-13 9:58 UTC (permalink / raw)
To: Linus Torvalds
Cc: Eyvind Bernhardsen, git, msysGit, Junio C Hamano, Dmitry Potapov,
Finn Arne Gangstad, Jay Soffian
Quick question here, while people would be in the convert.c functions
when making the above changes. This question is related to detecting
whether a file is text, but the question could be spun off to a
different thread if you so wish...
Have you considered skipping the UTF8 BOM and provided that the
remaining content is considered text allow auto conversions? The check
is simple, and would cover at least 50% of latin-derived languages.
Since you have the buffer at hand, and are in the same file
(convert.c), simply check for an initial EF BB BF. This would fix some
text files created on Windows (someone had mentioned Notepad I
believe). Out of the box experience for eol and text detection for
Windows users would be improved.
Bob
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"
2010-05-13 1:38 ` Linus Torvalds
2010-05-13 9:39 ` Robert Buck
@ 2010-05-13 10:59 ` Eyvind Bernhardsen
2010-05-13 21:45 ` Linus Torvalds
1 sibling, 1 reply; 27+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-13 10:59 UTC (permalink / raw)
To: Linus Torvalds
Cc: git, msysGit, Junio C Hamano, Dmitry Potapov, Robert Buck,
Finn Arne Gangstad, Jay Soffian
On 13. mai 2010, at 03.38, Linus Torvalds wrote:
> so now you can write:
>
> *.jpg -text
> *.txt text
> *.vcproj eol=crlf
> *.sh eol=lf
> * text=auto
[...]
> Doesn't that look finally sane?
>
> Because if we really renaem the attributes, let's rename them _right_.
Beautiful.
Do you agree that "native" eol should only be CRLF if autocrlf is true? Otherwise, if .gitattributes looks like this:
*.txt text
git will put CRLFs in .txt files but LFs in .c files, and I don't think that makes much sense.
--
Eyvind
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"
2010-05-13 9:58 ` Robert Buck
@ 2010-05-13 11:47 ` Eyvind Bernhardsen
2010-05-13 13:19 ` Robert Buck
2010-05-14 10:16 ` utf8 BOM Dmitry Potapov
0 siblings, 2 replies; 27+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-13 11:47 UTC (permalink / raw)
To: Robert Buck; +Cc: git@vger.kernel.org List, msysGit
On 13. mai 2010, at 11.58, Robert Buck wrote:
> Quick question here, while people would be in the convert.c functions
> when making the above changes. This question is related to detecting
> whether a file is text, but the question could be spun off to a
> different thread if you so wish...
>
> Have you considered skipping the UTF8 BOM and provided that the
> remaining content is considered text allow auto conversions? The check
> is simple, and would cover at least 50% of latin-derived languages.
> Since you have the buffer at hand, and are in the same file
> (convert.c), simply check for an initial EF BB BF. This would fix some
> text files created on Windows (someone had mentioned Notepad I
> believe). Out of the box experience for eol and text detection for
> Windows users would be improved.
I just did a quick test with a plain text file; it was detected as text both with and without a utf8 BOM. Looking at the code, characters >= 128 are considered printable so the BOM shouldn't make any difference at all. Do you have an example utf8 text file that is misdetected as binary?
--
Eyvind
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"
2010-05-13 11:47 ` Eyvind Bernhardsen
@ 2010-05-13 13:19 ` Robert Buck
2010-05-14 10:16 ` utf8 BOM Dmitry Potapov
1 sibling, 0 replies; 27+ messages in thread
From: Robert Buck @ 2010-05-13 13:19 UTC (permalink / raw)
To: Eyvind Bernhardsen; +Cc: git@vger.kernel.org List, msysGit
On Thu, May 13, 2010 at 7:47 AM, Eyvind Bernhardsen
<eyvind.bernhardsen@gmail.com> wrote:
> On 13. mai 2010, at 11.58, Robert Buck wrote:
>
>> Quick question here, while people would be in the convert.c functions
>> when making the above changes. This question is related to detecting
>> whether a file is text, but the question could be spun off to a
>> different thread if you so wish...
>>
>> Have you considered skipping the UTF8 BOM and provided that the
>> remaining content is considered text allow auto conversions? The check
>> is simple, and would cover at least 50% of latin-derived languages.
>> Since you have the buffer at hand, and are in the same file
>> (convert.c), simply check for an initial EF BB BF. This would fix some
>> text files created on Windows (someone had mentioned Notepad I
>> believe). Out of the box experience for eol and text detection for
>> Windows users would be improved.
>
> I just did a quick test with a plain text file; it was detected as text both with and without a utf8 BOM. Looking at the code, characters >= 128 are considered printable so the BOM shouldn't make any difference at all. Do you have an example utf8 text file that is misdetected as binary?
Sorry, my bad. I misread a line in convert.c. It handles UTF-8 beautifully.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"
2010-05-13 10:59 ` [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv" Eyvind Bernhardsen
@ 2010-05-13 21:45 ` Linus Torvalds
2010-05-14 2:34 ` Robert Buck
2010-05-14 21:16 ` Eyvind Bernhardsen
0 siblings, 2 replies; 27+ messages in thread
From: Linus Torvalds @ 2010-05-13 21:45 UTC (permalink / raw)
To: Eyvind Bernhardsen
Cc: git, msysGit, Junio C Hamano, Dmitry Potapov, Robert Buck,
Finn Arne Gangstad, Jay Soffian
On Thu, 13 May 2010, Eyvind Bernhardsen wrote:
>
> Do you agree that "native" eol should only be CRLF if autocrlf is true?
Not really. We're trying to get _away_ from .gitattributes depending on
autocrlf, aren't we?
> Otherwise, if .gitattributes looks like this:
>
> *.txt text
>
> git will put CRLFs in .txt files but LFs in .c files, and I don't think
> that makes much sense.
Well, but that's what you asked for, isn't it? And I don't see why you say
*.c files would have LF's, since that depends on what you put in them: and
under Windows, that might well be CRLF.
And I do think it's perfectly reasonable to override the "native" mode in
your .git/config. If we're renaming the attributes, we might as well then
introduce a
[core]
eol=lf
to set the "native" EOL for that repo, exactly because presumably a number
of Windows people would like to see the saner LF-only model rather than
the traditional native CRLF.
In fact, maybe it would even make sense to just make LF the default
"native" end-of-line sequence even on windows, so that Windows people who
actually want CRLF would have to set core.eol=crlf. Whatever. That would
be for the Windows git users to fight out, I don't care.
But if we are going to clean up text attribute handling, then I really
think we want to totally break that old "core.autocrlf" dependency.
Linus
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"
2010-05-13 21:45 ` Linus Torvalds
@ 2010-05-14 2:34 ` Robert Buck
2010-05-14 4:56 ` Jonathan Nieder
2010-05-14 21:32 ` Eyvind Bernhardsen
2010-05-14 21:16 ` Eyvind Bernhardsen
1 sibling, 2 replies; 27+ messages in thread
From: Robert Buck @ 2010-05-14 2:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: Eyvind Bernhardsen, git, msysGit, Junio C Hamano, Dmitry Potapov,
Finn Arne Gangstad, Jay Soffian
Probably a newbie question, lots to read, lots already read, but I
really want to verify if I have this correct. So in a nutshell, in the
gitattributes file
* text
*.foo binary
means autoconvert everything regardless of the autocrlf setting,
except for .foo files ? So now we can dispense with the autocrlf
attribute altogether if we so wish?
- Bob
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"
2010-05-14 2:34 ` Robert Buck
@ 2010-05-14 4:56 ` Jonathan Nieder
2010-05-14 21:21 ` Eyvind Bernhardsen
2010-05-14 21:32 ` Eyvind Bernhardsen
1 sibling, 1 reply; 27+ messages in thread
From: Jonathan Nieder @ 2010-05-14 4:56 UTC (permalink / raw)
To: Robert Buck
Cc: Linus Torvalds, Eyvind Bernhardsen, git, msysGit, Junio C Hamano,
Dmitry Potapov, Finn Arne Gangstad, Jay Soffian
Hi Bob,
Robert Buck wrote:
> * text
> *.foo binary
>
> means autoconvert everything regardless of the autocrlf setting,
> except for .foo files ? So now we can dispense with the autocrlf
> attribute altogether if we so wish?
If I understand correctly, there is no autocrlf attribute, just a
configuration item. If you put
* crlf
*.foo -crlf
in your .gitattributes with current git, this means:
- if the '[core] autocrlf' configuration is not set, do not convert
anything;
- otherwise, convert everything except for .foo files
Eyvind’s series improves that in a few ways.
- [from Finn Arne Gangstad] If the in-repository copy of a file
contains any carriage returns, do not try to convert it. This
makes it easier to deal with mistakes.
- For files with crlf enabled through attributes, always convert,
whether '[core] autocrlf' is enabled or not.
- Use the '[core] autocrlf' setting to determine the desired
line-ending for checked-out files (\r\n if true, \n otherwise).
A new eol attribute is provided to override that setting.
- The crlf attribute gets a new synonym "text" to avoid confusion.
There is also some change to the result of file type autodetection,
but as long as your .gitattributes uses '* crlf' or '* -crlf', there
is no need to worry about this.
Hope that helps,
Jonathan
^ permalink raw reply [flat|nested] 27+ messages in thread
* utf8 BOM
2010-05-13 11:47 ` Eyvind Bernhardsen
2010-05-13 13:19 ` Robert Buck
@ 2010-05-14 10:16 ` Dmitry Potapov
2010-05-15 20:23 ` Eyvind Bernhardsen
1 sibling, 1 reply; 27+ messages in thread
From: Dmitry Potapov @ 2010-05-14 10:16 UTC (permalink / raw)
To: Eyvind Bernhardsen; +Cc: Robert Buck, git@vger.kernel.org List, msysGit
On Thu, May 13, 2010 at 01:47:45PM +0200, Eyvind Bernhardsen wrote:
>
> I just did a quick test with a plain text file; it was detected as
> text both with and without a utf8 BOM. Looking at the code,
> characters >= 128 are considered printable so the BOM shouldn't make
> any difference at all. Do you have an example utf8 text file that is
> misdetected as binary?
Though UTF-8 BOM does not present any problem for automatic text
detector, it is another piece from Microsoft that creates some
interoperability issues when you work with non-ASCII text files.
In short:
1. Microsoft editors and tools like to add utf8 BOM to files, and
you cannot turn this behavior off.
2. Many tools (such as Microsoft compiler) incapable to recognize
UTF-8 files without BOM, so they screw up all non-ASCII chars.
#1 is a problem, because it creates changes consisting solely of adding
utf8 BOM. Moreover, users of non-Windows platforms are not exactly
thrilled with having utf8 BOM at the beginning of every text file.
Probably, ability of automatic add utf8 BOM on Windows to text files
(which are marked as "unicode") can be helpful, but it is just a part
of the problem of how to deal with text files in "legacy" encoding,
which are still widely used on Windows.
Dmitry
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"
2010-05-13 21:45 ` Linus Torvalds
2010-05-14 2:34 ` Robert Buck
@ 2010-05-14 21:16 ` Eyvind Bernhardsen
2010-05-14 21:27 ` Linus Torvalds
1 sibling, 1 reply; 27+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-14 21:16 UTC (permalink / raw)
To: Linus Torvalds
Cc: git, msysGit, Junio C Hamano, Dmitry Potapov, Robert Buck,
Finn Arne Gangstad, Jay Soffian
On 13. mai 2010, at 23.45, Linus Torvalds wrote:
> On Thu, 13 May 2010, Eyvind Bernhardsen wrote:
>>
>> Do you agree that "native" eol should only be CRLF if autocrlf is true?
>
> Not really. We're trying to get _away_ from .gitattributes depending on
> autocrlf, aren't we?
I'm not sure we still are. I certainly was when I started this series, but that was because autocrlf just plain didn't work with many existing repositories. When "safe autocrlf" fixed that, I decided that the extra complexity of core.eolStyle wasn't worth it.
I could be wrong, and I'd be happy to add it later. I don't think this series requires it, though.
I'd like to make my terms explicit: when I say "core.autocrlf", I mean a config value that makes git normalize all text files automagically. "core.eol" would be a different config value that simply tells git what line endings to put in files that are explicitly flagged as "text" (or automatically detected by "text=auto").
>> Otherwise, if .gitattributes looks like this:
>>
>> *.txt text
>>
>> git will put CRLFs in .txt files but LFs in .c files, and I don't think
>> that makes much sense.
>
> Well, but that's what you asked for, isn't it? And I don't see why you say
> *.c files would have LF's, since that depends on what you put in them: and
> under Windows, that might well be CRLF.
That's not an interesting problem. If you're okay with CRLFs in your repository there's no need for you to use text file normalization at all, and you're certainly not going to bother to set any text attributes. Everything will Just Work.
To make it more relevant, let's consider what would happen if you suddenly wanted to share that repository with a Linux user. You would clearly have been better off if the text files had been normalized, but I can only see three ways this could happen:
1. You set "* text=auto" when you created the repository
2. text=auto is the default for all files
3. autocrlf=true is set by default on Windows
The first option is unrealistic, and we probably agree that the second one is a bad idea. That's why, once Finn Arne fixed autocrlf, I realized it's not all that bad.
> And I do think it's perfectly reasonable to override the "native" mode in
> your .git/config. If we're renaming the attributes, we might as well then
> introduce a
>
> [core]
> eol=lf
>
> to set the "native" EOL for that repo, exactly because presumably a number
> of Windows people would like to see the saner LF-only model rather than
> the traditional native CRLF.
But they can equally easily set "core.autocrlf=false". Although the name still grates.
> In fact, maybe it would even make sense to just make LF the default
> "native" end-of-line sequence even on windows, so that Windows people who
> actually want CRLF would have to set core.eol=crlf. Whatever. That would
> be for the Windows git users to fight out, I don't care.
This is the crux of the problem. It's possible that I'm just being prejudiced, but I think that if someone wants CRLF as a _default_ they probably want it to be the default for all text files, not just normalized ones.
> But if we are going to clean up text attribute handling, then I really
> think we want to totally break that old "core.autocrlf" dependency.
"core.autocrlf=true" is exactly equivalent to "core.eol=crlf" in a repository with "* text=auto" (setting the "text" attribute disables the index check).
In a repository that doesn't care, "core.autocrlf=true" will normalize your text files and put CRLFs in them, while "core.eol=crlf" won't do a thing.
Unless you're simply arguing for renaming autocrlf to eol?
--
Eyvind
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"
2010-05-14 4:56 ` Jonathan Nieder
@ 2010-05-14 21:21 ` Eyvind Bernhardsen
0 siblings, 0 replies; 27+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-14 21:21 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Robert Buck, Linus Torvalds, git, msysGit, Junio C Hamano,
Dmitry Potapov, Finn Arne Gangstad, Jay Soffian
On 14. mai 2010, at 06.56, Jonathan Nieder wrote:
[Lots of good answers cut]
> - The crlf attribute gets a new synonym "text" to avoid confusion.
I would prefer to phrase that as "the text attribute has the synonym 'crlf' for backwards compatilibity". If I wanted to avoid confusion I wouldn't have renamed it ;)
--
Eyvind
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"
2010-05-14 21:16 ` Eyvind Bernhardsen
@ 2010-05-14 21:27 ` Linus Torvalds
2010-05-15 20:47 ` [PATCH] Add "core.eol" variable to control end-of-line conversion Eyvind Bernhardsen
0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2010-05-14 21:27 UTC (permalink / raw)
To: Eyvind Bernhardsen
Cc: git, msysGit, Junio C Hamano, Dmitry Potapov, Robert Buck,
Finn Arne Gangstad, Jay Soffian
On Fri, 14 May 2010, Eyvind Bernhardsen wrote:
> On 13. mai 2010, at 23.45, Linus Torvalds wrote:
>
> > On Thu, 13 May 2010, Eyvind Bernhardsen wrote:
> >>
> >> Do you agree that "native" eol should only be CRLF if autocrlf is true?
> >
> > Not really. We're trying to get _away_ from .gitattributes depending on
> > autocrlf, aren't we?
>
> I'm not sure we still are. I certainly was when I started this series,
> but that was because autocrlf just plain didn't work with many existing
> repositories. When "safe autocrlf" fixed that, I decided that the extra
> complexity of core.eolStyle wasn't worth it.
The thing is, I disagree with your notion of "safe autocrlf". I think it's
ugly, and I don't think it's safe at all. It adds a _feeling_ of safety
that isn't actually safe.
In short:
- core.autocrlf is _always_ dangerous. Your "safe" thing isn't any safer
at all, since it depends on something that isn't reliable (previous
state).
Example: new binary files, or changed files, or renames.
- so if you want text conversion, but you want it to be truly safe, and
only happen for certain files, YOU MUST NOT ENABLE autocrlf.
- Ergo: if you make the .gitattributes behaviour depend on autocrlf,
you're still screwed, and you've not actually improved on anything at
all in the end.
It's really that simple. I think "autocrlf" actually works pretty well,
but at the same time, I think we made mistakes in the initial design.
Let's not make them again.
Linus
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"
2010-05-14 2:34 ` Robert Buck
2010-05-14 4:56 ` Jonathan Nieder
@ 2010-05-14 21:32 ` Eyvind Bernhardsen
1 sibling, 0 replies; 27+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-14 21:32 UTC (permalink / raw)
To: Robert Buck
Cc: Linus Torvalds, git@vger.kernel.org List, msysGit, Junio C Hamano,
Dmitry Potapov, Finn Arne Gangstad, Jay Soffian, Jonathan Nieder
On 14. mai 2010, at 04.34, Robert Buck wrote:
> Probably a newbie question, lots to read, lots already read, but I
> really want to verify if I have this correct. So in a nutshell, in the
> gitattributes file
>
> * text
I missed this when I replied to Jonathan, but you probably want "* text=auto" here. "* text" would force git to treat all files as text files.
Also, as Jonathan said, if you want CRLF line endings you currently have to have core.autocrlf set to "true" (which is the default on Windows).
--
Eyvind
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: utf8 BOM
2010-05-14 10:16 ` utf8 BOM Dmitry Potapov
@ 2010-05-15 20:23 ` Eyvind Bernhardsen
2010-05-16 5:19 ` Dmitry Potapov
0 siblings, 1 reply; 27+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-15 20:23 UTC (permalink / raw)
To: Dmitry Potapov; +Cc: Robert Buck, git@vger.kernel.org List, msysGit
On 14. mai 2010, at 12.16, Dmitry Potapov wrote:
> Probably, ability of automatic add utf8 BOM on Windows to text files
> (which are marked as "unicode") can be helpful, but it is just a part
> of the problem of how to deal with text files in "legacy" encoding,
> which are still widely used on Windows.
Sounds like something a clean/smudge filter should be able to do. The clean filter converts legacy encoded text to utf8 and strips any utf8 BOM before checking the file in, and the smudge filter writes the file out as utf8 with a BOM (which hopefully works no matter what your code page is? I don't know much about Windows i18n).
Adding this to convert.c would be more difficult, at least politically, since I assume it would be Windows-specific code.
--
Eyvind
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH] Add "core.eol" variable to control end-of-line conversion
2010-05-14 21:27 ` Linus Torvalds
@ 2010-05-15 20:47 ` Eyvind Bernhardsen
2010-05-16 10:39 ` Robert Buck
0 siblings, 1 reply; 27+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-15 20:47 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git, msysGit, Junio C Hamano, Finn Arne Gangstad
Introduce a new configuration variable, "core.eol", that allows the user
to set which line endings to use for end-of-line-normalized files in the
working directory. It defaults to "native", which means CRLF on Windows
and LF everywhere else.
For backwards compatibility, "core.autocrlf" will override core.eol if
core.eol is left unset. This means that
[core]
autocrlf = true
will give CRLFs in the working directory even on platforms with LF as
their native line ending.
If core.eol is set explicitly (including setting it to "native"), it
will override core.autocrlf so that
[core]
autocrlf = true
eol = lf
normalizes all files that look like text, but does not put CRLFs in the
working directory.
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
---
It turns out that my resistance to "core.eol" was mostly laziness, so I
just implemented it.
I decided that "core.autocrlf" has to override the native line ending if
"core.eol" isn't set explicitly, which gives some extra complexity in
convert.c.
For 1.8 I would consider making core.autocrlf just turn on normalization
and leave the working directory line ending decision to core.eol, but
that _will_ break people's setups.
Patch is on top of my latest series.
--
Eyvind
Documentation/config.txt | 8 ++++
Documentation/gitattributes.txt | 6 ++-
Makefile | 3 +
cache.h | 13 ++++++
config.c | 12 ++++++
convert.c | 39 +++++++++++-------
environment.c | 1 +
t/t0026-eol-config.sh | 83 +++++++++++++++++++++++++++++++++++++++
8 files changed, 149 insertions(+), 16 deletions(-)
create mode 100755 t/t0026-eol-config.sh
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 207351b..7cc15a4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -207,6 +207,14 @@ core.autocrlf::
the file's `text` attribute, or if `text` is unspecified,
based on the file's contents. See linkgit:gitattributes[5].
+core.eol::
+ Sets the line ending type to use in the working directory for
+ files that have the `text` property set. Alternatives are
+ 'lf', 'crlf' and 'native', which uses the platform's native
+ line ending. The default value is `native`. See
+ linkgit:gitattributes[5] for more information on end-of-line
+ conversion.
+
core.safecrlf::
If true, makes git check if converting `CRLF` is reversible when
end-of-line conversion is active. Git will verify if a command
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 25753b7..8268c09 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -207,7 +207,11 @@ attribute to "auto" for _all_ files.
------------------------
This ensures that all files that git considers to be text will have
-normalized (LF) line endings in the repository.
+normalized (LF) line endings in the repository. The `core.eol`
+configuration variable controls which line endings git will use for
+normalized files in your working directory; the default is to use the
+native line ending for your platform, or CRLF if `core.autocrlf` is
+set.
NOTE: When `text=auto` normalization is enabled in an existing
repository, any text files containing CRLFs should be normalized. If
diff --git a/Makefile b/Makefile
index 910f471..419532e 100644
--- a/Makefile
+++ b/Makefile
@@ -224,6 +224,8 @@ all::
#
# Define CHECK_HEADER_DEPENDENCIES to check for problems in the hard-coded
# dependency rules.
+#
+# Define NATIVE_CRLF if your platform uses CRLF for line endings.
GIT-VERSION-FILE: FORCE
@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -989,6 +991,7 @@ ifeq ($(uname_S),Windows)
NO_CURL = YesPlease
NO_PYTHON = YesPlease
BLK_SHA1 = YesPlease
+ NATIVE_CRLF = YesPlease
CC = compat/vcbuild/scripts/clink.pl
AR = compat/vcbuild/scripts/lib.pl
diff --git a/cache.h b/cache.h
index d1f669e..ac6bfbd 100644
--- a/cache.h
+++ b/cache.h
@@ -568,6 +568,19 @@ enum auto_crlf {
extern enum auto_crlf auto_crlf;
+enum eol {
+ EOL_UNSET,
+ EOL_CRLF,
+ EOL_LF,
+#ifdef NATIVE_CRLF
+ EOL_NATIVE = EOL_CRLF
+#else
+ EOL_NATIVE = EOL_LF
+#endif
+};
+
+extern enum eol eol;
+
enum branch_track {
BRANCH_TRACK_UNSPECIFIED = -1,
BRANCH_TRACK_NEVER = 0,
diff --git a/config.c b/config.c
index b60a1ff..4edd940 100644
--- a/config.c
+++ b/config.c
@@ -477,6 +477,18 @@ static int git_default_core_config(const char *var, const char *value)
return 0;
}
+ if (!strcmp(var, "core.eol")) {
+ if (value && !strcasecmp(value, "lf"))
+ eol = EOL_LF;
+ else if (value && !strcasecmp(value, "crlf"))
+ eol = EOL_CRLF;
+ else if (value && !strcasecmp(value, "native"))
+ eol = EOL_NATIVE;
+ else
+ eol = EOL_UNSET;
+ return 0;
+ }
+
if (!strcmp(var, "core.notesref")) {
notes_ref_name = xstrdup(value);
return 0;
diff --git a/convert.c b/convert.c
index a309e07..b7ee469 100644
--- a/convert.c
+++ b/convert.c
@@ -20,12 +20,6 @@ enum action {
CRLF_AUTO,
};
-enum eol {
- EOL_UNSET,
- EOL_LF,
- EOL_CRLF,
-};
-
struct text_stat {
/* NUL, CR, LF and CRLF counts */
unsigned nul, cr, lf, crlf;
@@ -244,12 +238,27 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
char *to_free = NULL;
struct text_stat stats;
- if ((action == CRLF_BINARY) || (action == CRLF_INPUT) ||
- (action != CRLF_CRLF && auto_crlf != AUTO_CRLF_TRUE))
+ if (!len)
return 0;
- if (!len)
+ switch (action) {
+ case CRLF_CRLF:
+ break;
+ case CRLF_BINARY:
+ case CRLF_INPUT:
return 0;
+ case CRLF_GUESS:
+ if (auto_crlf == AUTO_CRLF_FALSE)
+ return 0;
+ /* fall through */
+ case CRLF_TEXT:
+ case CRLF_AUTO:
+ if (eol == EOL_LF ||
+ (eol == EOL_UNSET &&
+ (auto_crlf == AUTO_CRLF_INPUT ||
+ auto_crlf == AUTO_CRLF_FALSE && EOL_NATIVE == EOL_LF)))
+ return 0;
+ }
gather_stats(src, len, &stats);
@@ -670,7 +679,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
{
struct git_attr_check check[5];
enum action action = CRLF_GUESS;
- enum eol eol = EOL_UNSET;
+ enum eol eol_attr = EOL_UNSET;
int ident = 0, ret = 0;
const char *filter = NULL;
@@ -682,7 +691,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
action = git_path_check_crlf(path, check + 0);
ident = git_path_check_ident(path, check + 1);
drv = git_path_check_convert(path, check + 2);
- eol = git_path_check_eol(path, check + 3);
+ eol_attr = git_path_check_eol(path, check + 3);
if (drv && drv->clean)
filter = drv->clean;
}
@@ -692,7 +701,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
src = dst->buf;
len = dst->len;
}
- action = determine_action(action, eol);
+ action = determine_action(action, eol_attr);
ret |= crlf_to_git(path, src, len, dst, action, checksafe);
if (ret) {
src = dst->buf;
@@ -705,7 +714,7 @@ int convert_to_working_tree(const char *path, const char *src, size_t len, struc
{
struct git_attr_check check[5];
enum action action = CRLF_GUESS;
- enum eol eol = EOL_UNSET;
+ enum eol eol_attr = EOL_UNSET;
int ident = 0, ret = 0;
const char *filter = NULL;
@@ -717,7 +726,7 @@ int convert_to_working_tree(const char *path, const char *src, size_t len, struc
action = git_path_check_crlf(path, check + 0);
ident = git_path_check_ident(path, check + 1);
drv = git_path_check_convert(path, check + 2);
- eol = git_path_check_eol(path, check + 3);
+ eol_attr = git_path_check_eol(path, check + 3);
if (drv && drv->smudge)
filter = drv->smudge;
}
@@ -727,7 +736,7 @@ int convert_to_working_tree(const char *path, const char *src, size_t len, struc
src = dst->buf;
len = dst->len;
}
- action = determine_action(action, eol);
+ action = determine_action(action, eol_attr);
ret |= crlf_to_worktree(path, src, len, dst, action);
if (ret) {
src = dst->buf;
diff --git a/environment.c b/environment.c
index db4a5e9..83d38d3 100644
--- a/environment.c
+++ b/environment.c
@@ -40,6 +40,7 @@ const char *editor_program;
const char *excludes_file;
enum auto_crlf auto_crlf = AUTO_CRLF_FALSE;
int read_replace_refs = 1;
+enum eol eol = EOL_UNSET;
enum safe_crlf safe_crlf = SAFE_CRLF_WARN;
unsigned whitespace_rule_cfg = WS_DEFAULT_RULE;
enum branch_track git_branch_track = BRANCH_TRACK_REMOTE;
diff --git a/t/t0026-eol-config.sh b/t/t0026-eol-config.sh
new file mode 100755
index 0000000..5b6c297
--- /dev/null
+++ b/t/t0026-eol-config.sh
@@ -0,0 +1,83 @@
+#!/bin/sh
+
+test_description='CRLF conversion'
+
+. ./test-lib.sh
+
+has_cr() {
+ tr '\015' Q <"$1" | grep Q >/dev/null
+}
+
+test_expect_success setup '
+
+ git config core.autocrlf false &&
+
+ echo "one text" > .gitattributes
+
+ for w in Hello world how are you; do echo $w; done >one &&
+ for w in I am very very fine thank you; do echo $w; done >two &&
+ git add . &&
+
+ git commit -m initial &&
+
+ one=`git rev-parse HEAD:one` &&
+ two=`git rev-parse HEAD:two` &&
+
+ echo happy.
+'
+
+test_expect_success 'eol=lf puts LFs in normalized file' '
+
+ rm -f .gitattributes tmp one two &&
+ git config core.eol lf &&
+ git read-tree --reset -u HEAD &&
+
+ ! has_cr one &&
+ ! has_cr two &&
+ onediff=`git diff one` &&
+ twodiff=`git diff two` &&
+ test -z "$onediff" -a -z "$twodiff"
+'
+
+test_expect_success 'eol=crlf puts CRLFs in normalized file' '
+
+ rm -f .gitattributes tmp one two &&
+ git config core.eol crlf &&
+ git read-tree --reset -u HEAD &&
+
+ has_cr one &&
+ ! has_cr two &&
+ onediff=`git diff one` &&
+ twodiff=`git diff two` &&
+ test -z "$onediff" -a -z "$twodiff"
+'
+
+test_expect_success 'eol=lf overrides autocrlf=true' '
+
+ rm -f .gitattributes tmp one two &&
+ git config core.eol lf &&
+ git config core.autocrlf true &&
+ git read-tree --reset -u HEAD &&
+
+ ! has_cr one &&
+ ! has_cr two &&
+ onediff=`git diff one` &&
+ twodiff=`git diff two` &&
+ test -z "$onediff" -a -z "$twodiff"
+'
+
+test_expect_success 'autocrlf=true overrides unset eol' '
+
+ rm -f .gitattributes tmp one two &&
+ git config --unset-all core.eol &&
+ git config core.autocrlf true &&
+ git read-tree --reset -u HEAD &&
+
+ has_cr one &&
+ has_cr two &&
+ onediff=`git diff one` &&
+ twodiff=`git diff two` &&
+ test -z "$onediff" -a -z "$twodiff"
+'
+
+test_done
--
1.7.1.5.gd739a
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: utf8 BOM
2010-05-15 20:23 ` Eyvind Bernhardsen
@ 2010-05-16 5:19 ` Dmitry Potapov
2010-05-16 10:37 ` Eyvind Bernhardsen
0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Potapov @ 2010-05-16 5:19 UTC (permalink / raw)
To: Eyvind Bernhardsen; +Cc: Robert Buck, git@vger.kernel.org List, msysGit
On Sat, May 15, 2010 at 10:23:52PM +0200, Eyvind Bernhardsen wrote:
> On 14. mai 2010, at 12.16, Dmitry Potapov wrote:
>
> > Probably, ability of automatic add utf8 BOM on Windows to text files
> > (which are marked as "unicode") can be helpful, but it is just a part
> > of the problem of how to deal with text files in "legacy" encoding,
> > which are still widely used on Windows.
>
> Sounds like something a clean/smudge filter should be able to do.
Yes, it should if you handful files that need such conversion. However,
if you want it for every text file, running filters are slow (especially
on Windows), and they are not capable to autodetect text.
> (which hopefully works no matter what your code
> page is? I don't know much about Windows i18n).
Yes, it does. I am not an expert on Windows either, but as far as I
know, BOM are used to mark unicode files, which could be either UTF-8
or UTF-16. BTW, UTF-16 are treated by Git as "binary" now, which may
not always convenient, because impossible to do "merge" or "diff".
> Adding this to convert.c would be more difficult, at least
> politically, since I assume it would be Windows-specific code.
I don't think it needs any Windows-specific code. We already have some
functions to convert text from different charsets, which could be used.
But this feature should be developed and tested by people who work on
Windows regularly and need this feature, because there is no substitute
for testing and experience of how well it works in practice. Currently,
I rarely use Windows and can get by clean/smudge filters.
Dmitry
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: utf8 BOM
2010-05-16 5:19 ` Dmitry Potapov
@ 2010-05-16 10:37 ` Eyvind Bernhardsen
2010-05-16 11:26 ` Tait
0 siblings, 1 reply; 27+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-16 10:37 UTC (permalink / raw)
To: Dmitry Potapov; +Cc: Robert Buck, git@vger.kernel.org List, msysGit
On 16. mai 2010, at 07.19, Dmitry Potapov wrote:
> On Sat, May 15, 2010 at 10:23:52PM +0200, Eyvind Bernhardsen wrote:
>> (which hopefully works no matter what your code
>> page is? I don't know much about Windows i18n).
>
> Yes, it does. I am not an expert on Windows either, but as far as I
> know, BOM are used to mark unicode files, which could be either UTF-8
> or UTF-16. BTW, UTF-16 are treated by Git as "binary" now, which may
> not always convenient, because impossible to do "merge" or "diff".
Okay, so something that checks text files to see if they're utf16 (maybe just accept anything with a utf16 BOM as utf16?) and converts them to utf8 might be useful on any platform. Stripping utf8 BOMs and optionally re-adding them on output would be a natural extension. "core.autoutf", anyone?
>> Adding this to convert.c would be more difficult, at least
>> politically, since I assume it would be Windows-specific code.
>
> I don't think it needs any Windows-specific code. We already have some
> functions to convert text from different charsets, which could be used.
> But this feature should be developed and tested by people who work on
> Windows regularly and need this feature, because there is no substitute
> for testing and experience of how well it works in practice. Currently,
> I rarely use Windows and can get by clean/smudge filters.
Yeah, the problem is finding someone who needs the feature _and_ is able/willing to implement it. I try to keep a Unix-like experience on Windows, so I don't usually run into utf8 BOMs.
--
Eyvind
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Add "core.eol" variable to control end-of-line conversion
2010-05-15 20:47 ` [PATCH] Add "core.eol" variable to control end-of-line conversion Eyvind Bernhardsen
@ 2010-05-16 10:39 ` Robert Buck
0 siblings, 0 replies; 27+ messages in thread
From: Robert Buck @ 2010-05-16 10:39 UTC (permalink / raw)
To: Eyvind Bernhardsen
Cc: Linus Torvalds, git, msysGit, Junio C Hamano, Finn Arne Gangstad
On Sat, May 15, 2010 at 4:47 PM, Eyvind Bernhardsen
<eyvind.bernhardsen@gmail.com> wrote:
> Introduce a new configuration variable, "core.eol", that allows the user
> to set which line endings to use for end-of-line-normalized files in the
> working directory. It defaults to "native", which means CRLF on Windows
> and LF everywhere else.
>
> For backwards compatibility, "core.autocrlf" will override core.eol if
> core.eol is left unset. This means that
>
> [core]
> autocrlf = true
>
> will give CRLFs in the working directory even on platforms with LF as
> their native line ending.
>
> If core.eol is set explicitly (including setting it to "native"), it
> will override core.autocrlf so that
>
> [core]
> autocrlf = true
> eol = lf
>
> normalizes all files that look like text, but does not put CRLFs in the
> working directory.
>
> Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
> ---
>
> It turns out that my resistance to "core.eol" was mostly laziness, so I
> just implemented it.
>
> I decided that "core.autocrlf" has to override the native line ending if
> "core.eol" isn't set explicitly, which gives some extra complexity in
> convert.c.
>
> For 1.8 I would consider making core.autocrlf just turn on normalization
> and leave the working directory line ending decision to core.eol, but
> that _will_ break people's setups.
>
> Patch is on top of my latest series.
> --
> Eyvind
Looking forward to this change. In terms of usability it is really
nice. Eager to see it in a release.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: utf8 BOM
2010-05-16 10:37 ` Eyvind Bernhardsen
@ 2010-05-16 11:26 ` Tait
2010-05-16 13:32 ` Dmitry Potapov
0 siblings, 1 reply; 27+ messages in thread
From: Tait @ 2010-05-16 11:26 UTC (permalink / raw)
To: git@vger.kernel.org List; +Cc: Eyvind Bernhardsen, Dmitry Potapov, Robert Buck
> Okay, so something that checks text files to see if they're utf...
> "core.autoutf", anyone?
This (and crlf-conversion, for that matter) strikes me as something best
handled outside of git core, such as through checkout/commit hooks. Perhaps
examples of such hooks could be provided and adapted by each project and
user as that user/project sees fit for their specific choice of repository
format and development environment.
Given that git already chose not to screw around with encodings or define
a canonical encoding for the on-disk format (it's just a string of bytes),
it would be consistent and reasonable to not mess with these other things,
too.
Tait
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: utf8 BOM
2010-05-16 11:26 ` Tait
@ 2010-05-16 13:32 ` Dmitry Potapov
0 siblings, 0 replies; 27+ messages in thread
From: Dmitry Potapov @ 2010-05-16 13:32 UTC (permalink / raw)
To: Tait; +Cc: git@vger.kernel.org List, Eyvind Bernhardsen, Robert Buck
On Sun, May 16, 2010 at 04:26:12AM -0700, Tait wrote:
> > Okay, so something that checks text files to see if they're utf...
> > "core.autoutf", anyone?
>
> This (and crlf-conversion, for that matter) strikes me as something best
> handled outside of git core, such as through checkout/commit hooks. Perhaps
> examples of such hooks could be provided and adapted by each project and
> user as that user/project sees fit for their specific choice of repository
> format and development environment.
There are a few problems with using filters for crlf conversion:
1. It is a way too slow... Running a script for each file is in a repo
is even slow on Linux, and on Windows, it is going to be horrible slow.
2. You have to install this filter in every clone, and by the time when
you install it, your repository is already checked out with the wrong
ending. So, you need to fix it.
While using scripts is good where you need flexibility, it is not the
case with crlf conversion. Users want it to just work, and they want
simple and easy to understand rules how to mark what files should and
should not be converted. If every project is going with itw own rules
and scripts, it is going to be a big mess.
Now, when we speak about charset encoding, it could make sense to try
this new feature as a filter, but if it is something that is to be used
widely, it should be eventually re-written in C.
Dmitry
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2010-05-16 13:33 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-12 23:00 [PATCH v3 0/5] End-of-line normalization, redesigned Eyvind Bernhardsen
2010-05-12 23:00 ` [PATCH v3 1/5] autocrlf: Make it work also for un-normalized repositories Eyvind Bernhardsen
2010-05-12 23:00 ` [PATCH v3 2/5] Add tests for per-repository eol normalization Eyvind Bernhardsen
2010-05-12 23:00 ` [PATCH v3 3/5] Add " Eyvind Bernhardsen
2010-05-12 23:00 ` [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv" Eyvind Bernhardsen
2010-05-13 1:38 ` Linus Torvalds
2010-05-13 9:39 ` Robert Buck
2010-05-13 9:58 ` Robert Buck
2010-05-13 11:47 ` Eyvind Bernhardsen
2010-05-13 13:19 ` Robert Buck
2010-05-14 10:16 ` utf8 BOM Dmitry Potapov
2010-05-15 20:23 ` Eyvind Bernhardsen
2010-05-16 5:19 ` Dmitry Potapov
2010-05-16 10:37 ` Eyvind Bernhardsen
2010-05-16 11:26 ` Tait
2010-05-16 13:32 ` Dmitry Potapov
2010-05-13 10:59 ` [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv" Eyvind Bernhardsen
2010-05-13 21:45 ` Linus Torvalds
2010-05-14 2:34 ` Robert Buck
2010-05-14 4:56 ` Jonathan Nieder
2010-05-14 21:21 ` Eyvind Bernhardsen
2010-05-14 21:32 ` Eyvind Bernhardsen
2010-05-14 21:16 ` Eyvind Bernhardsen
2010-05-14 21:27 ` Linus Torvalds
2010-05-15 20:47 ` [PATCH] Add "core.eol" variable to control end-of-line conversion Eyvind Bernhardsen
2010-05-16 10:39 ` Robert Buck
2010-05-12 23:00 ` [RFC/PATCH v3 5/5] Rename "core.autocrlf" config variable as "core.eolconv" Eyvind Bernhardsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).