* [PATCH v4 0/3] CRLF merge conflict reduction, take 4
@ 2010-06-27 19:43 Eyvind Bernhardsen
2010-06-27 19:43 ` [PATCH v4 1/3] Avoid conflicts when merging branches with mixed normalization Eyvind Bernhardsen
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Eyvind Bernhardsen @ 2010-06-27 19:43 UTC (permalink / raw)
To: Junio C Hamano
Cc: Johannes Sixt, Finn Arne Gangstad, git@vger.kernel.org List
Only a few changes since the last round:
- added some paragraphs to the filter documentation in gitattributes to
explain why having normalizing filters is a good thing (comments
welcome)
- fixed the problem Johannes spotted in the eol=crlf optimization by
expanding CRLFs (ie not optimizing) if a smudge filter is configured
- moved the opening brace in my function definitions to the next line
Thanks for your input. I'm pretty happy with this now.
--
Eyvind
Eyvind Bernhardsen (3):
Avoid conflicts when merging branches with mixed normalization
Try normalizing files to avoid delete/modify conflicts when merging
Don't expand CRLFs when normalizing text during merge
Documentation/gitattributes.txt | 27 ++++++++++++++++++
cache.h | 1 +
convert.c | 37 +++++++++++++++++++++----
ll-merge.c | 13 +++++++++
merge-recursive.c | 44 ++++++++++++++++++++++++++++-
t/t6038-merge-text-auto.sh | 58 +++++++++++++++++++++++++++++++++++++++
6 files changed, 172 insertions(+), 8 deletions(-)
create mode 100755 t/t6038-merge-text-auto.sh
--
1.7.1.575.g383de
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4 1/3] Avoid conflicts when merging branches with mixed normalization
2010-06-27 19:43 [PATCH v4 0/3] CRLF merge conflict reduction, take 4 Eyvind Bernhardsen
@ 2010-06-27 19:43 ` Eyvind Bernhardsen
2010-06-28 8:02 ` Finn Arne Gangstad
2010-06-27 19:43 ` [PATCH v4 2/3] Try normalizing files to avoid delete/modify conflicts when merging Eyvind Bernhardsen
2010-06-27 19:43 ` [PATCH v4 3/3] Don't expand CRLFs when normalizing text during merge Eyvind Bernhardsen
2 siblings, 1 reply; 14+ messages in thread
From: Eyvind Bernhardsen @ 2010-06-27 19:43 UTC (permalink / raw)
To: Junio C Hamano
Cc: Johannes Sixt, Finn Arne Gangstad, git@vger.kernel.org List
Currently, merging across changes in line ending normalization is
painful since files containing CRLF will conflict with normalized files,
even if the only difference between the two versions is the line
endings. Additionally, any "real" merge conflicts that exist are
obscured because every line in the file has a conflict.
Assume you start out with a repo that has a lot of text files with CRLF
checked in (A):
o---C
/ \
A---B---D
B: Add "* text=auto" to .gitattributes and normalize all files to
LF-only
C: Modify some of the text files
D: Try to merge C
You will get a ridiculous number of LF/CRLF conflicts when trying to
merge C into D, since the repository contents for C are "wrong" wrt the
new .gitattributes file.
Fix ll-merge so that the "base", "theirs" and "ours" stages are passed
through convert_to_worktree() and convert_to_git() before a three-way
merge. This ensures that all three stages are normalized in the same
way, removing from consideration differences that are only due to
normalization.
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
---
Documentation/gitattributes.txt | 27 ++++++++++++++++++
cache.h | 1 +
convert.c | 16 +++++++++-
ll-merge.c | 13 +++++++++
t/t6038-merge-text-auto.sh | 58 +++++++++++++++++++++++++++++++++++++++
5 files changed, 113 insertions(+), 2 deletions(-)
create mode 100755 t/t6038-merge-text-auto.sh
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 564586b..b110082 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -317,6 +317,18 @@ command is "cat").
smudge = cat
------------------------
+For best results, `clean` and `smudge` commands should produce output
+that is not dependent on the corresponding command having been run.
+That is, `clean` should produce identical output whether its input has
+been run through `smudge` or not, and `smudge` should not rely on its
+input having been run through `clean`. See the section on merging
+below for a rationale.
+
+The example "indent" filter is well-behaved in this regard: it will
+accept input that is already correctly indented without modifying it.
+In this case, the lack of a smudge filter means that the clean filter
+_must_ accept its own output without modifying it.
+
Interaction between checkin/checkout attributes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -331,6 +343,21 @@ In the check-out codepath, the blob content is first converted
with `text`, and then `ident` and fed to `filter`.
+Merging branches with differing checkin/checkout attributes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To prevent unnecessary merge conflicts, git runs a virtual check-out
+and check-in of all three stages of a file when resolving a three-way
+merge. This prevents changes caused by check-in conversion from
+causing spurious merge conflicts when a converted file is merged with
+an unconverted file.
+
+This strategy will break down if a `smudge` filter relies on its input
+having been processed by the corresponding `clean` filter or vice
+versa. Such filters may otherwise work well, but will prevent
+automatic merging.
+
+
Generating diff text
~~~~~~~~~~~~~~~~~~~~
diff --git a/cache.h b/cache.h
index ff4a7c2..5db89f9 100644
--- a/cache.h
+++ b/cache.h
@@ -1043,6 +1043,7 @@ extern void trace_argv_printf(const char **argv, const char *format, ...);
extern int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe);
extern int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst);
+extern int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst);
/* add */
/*
diff --git a/convert.c b/convert.c
index e41a31e..0203be8 100644
--- a/convert.c
+++ b/convert.c
@@ -93,7 +93,8 @@ static int is_binary(unsigned long size, struct text_stat *stats)
return 0;
}
-static enum eol determine_output_conversion(enum action action) {
+static enum eol determine_output_conversion(enum action action)
+{
switch (action) {
case CRLF_BINARY:
return EOL_UNSET;
@@ -693,7 +694,8 @@ static int git_path_check_ident(const char *path, struct git_attr_check *check)
return !!ATTR_TRUE(value);
}
-enum action determine_action(enum action text_attr, enum eol eol_attr) {
+static enum action determine_action(enum action text_attr, enum eol eol_attr)
+{
if (text_attr == CRLF_BINARY)
return CRLF_BINARY;
if (eol_attr == EOL_LF)
@@ -773,3 +775,13 @@ int convert_to_working_tree(const char *path, const char *src, size_t len, struc
}
return ret | apply_filter(path, src, len, dst, filter);
}
+
+int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst)
+{
+ int ret = convert_to_working_tree(path, src, len, dst);
+ if (ret) {
+ src = dst->buf;
+ len = dst->len;
+ }
+ return ret | convert_to_git(path, src, len, dst, 0);
+}
diff --git a/ll-merge.c b/ll-merge.c
index 3764a1a..28c6f54 100644
--- a/ll-merge.c
+++ b/ll-merge.c
@@ -321,6 +321,16 @@ static int git_path_check_merge(const char *path, struct git_attr_check check[2]
return git_checkattr(path, 2, check);
}
+static void normalize_file(mmfile_t *mm, const char *path)
+{
+ struct strbuf strbuf = STRBUF_INIT;
+ if (renormalize_buffer(path, mm->ptr, mm->size, &strbuf)) {
+ free(mm->ptr);
+ mm->size = strbuf.len;
+ mm->ptr = strbuf_detach(&strbuf, NULL);
+ }
+}
+
int ll_merge(mmbuffer_t *result_buf,
const char *path,
mmfile_t *ancestor, const char *ancestor_label,
@@ -334,6 +344,9 @@ int ll_merge(mmbuffer_t *result_buf,
const struct ll_merge_driver *driver;
int virtual_ancestor = flag & 01;
+ normalize_file(ancestor, path);
+ normalize_file(ours, path);
+ normalize_file(theirs, path);
if (!git_path_check_merge(path, check)) {
ll_driver_name = check[0].value;
if (check[1].value) {
diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
new file mode 100755
index 0000000..44e6003
--- /dev/null
+++ b/t/t6038-merge-text-auto.sh
@@ -0,0 +1,58 @@
+#!/bin/sh
+
+test_description='CRLF merge conflict across text=auto change'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+ git config core.autocrlf false &&
+ echo first line | append_cr >file &&
+ git add file &&
+ git commit -m "Initial" &&
+ git tag initial &&
+ git branch side &&
+ echo "* text=auto" >.gitattributes &&
+ touch file &&
+ git add .gitattributes file &&
+ git commit -m "normalize file" &&
+ echo same line | append_cr >>file &&
+ git add file &&
+ git commit -m "add line from a" &&
+ git tag a &&
+ git rm .gitattributes &&
+ rm file &&
+ git checkout file &&
+ git commit -m "remove .gitattributes" &&
+ git tag c &&
+ git checkout side &&
+ echo same line | append_cr >>file &&
+ git commit -m "add line from b" file &&
+ git tag b &&
+ git checkout master
+'
+
+test_expect_success 'Check merging after setting text=auto' '
+ git reset --hard a &&
+ git merge b &&
+ cat file | remove_cr >file.temp &&
+ test_cmp file file.temp
+'
+
+test_expect_success 'Check merging addition of text=auto' '
+ git reset --hard b &&
+ git merge a &&
+ cat file | remove_cr >file.temp &&
+ test_cmp file file.temp
+'
+
+test_expect_failure 'Test delete/normalize conflict' '
+ git checkout side &&
+ git reset --hard initial &&
+ git rm file &&
+ git commit -m "remove file" &&
+ git checkout master &&
+ git reset --hard a^ &&
+ git merge side
+'
+
+test_done
--
1.7.1.575.g383de
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 2/3] Try normalizing files to avoid delete/modify conflicts when merging
2010-06-27 19:43 [PATCH v4 0/3] CRLF merge conflict reduction, take 4 Eyvind Bernhardsen
2010-06-27 19:43 ` [PATCH v4 1/3] Avoid conflicts when merging branches with mixed normalization Eyvind Bernhardsen
@ 2010-06-27 19:43 ` Eyvind Bernhardsen
2010-06-27 19:43 ` [PATCH v4 3/3] Don't expand CRLFs when normalizing text during merge Eyvind Bernhardsen
2 siblings, 0 replies; 14+ messages in thread
From: Eyvind Bernhardsen @ 2010-06-27 19:43 UTC (permalink / raw)
To: Junio C Hamano
Cc: Johannes Sixt, Finn Arne Gangstad, git@vger.kernel.org List
If a file is modified due to normalization on one branch, and deleted on
another, a merge of the two branches will result in a delete/modify
conflict for that file even if it is otherwise unchanged.
Try to avoid the conflict by normalizing and comparing the "base" file
and the modified file when their sha1s differ. If they compare equal,
the file is considered unmodified and is deleted.
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
---
merge-recursive.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
t/t6038-merge-text-auto.sh | 2 +-
2 files changed, 43 insertions(+), 3 deletions(-)
diff --git a/merge-recursive.c b/merge-recursive.c
index 206c103..f4f09a2 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1056,6 +1056,44 @@ static unsigned char *stage_sha(const unsigned char *sha, unsigned mode)
return (is_null_sha1(sha) || mode == 0) ? NULL: (unsigned char *)sha;
}
+static int read_sha1_strbuf(const unsigned char *sha, const char *path,
+ struct strbuf *dst)
+{
+ void *buf;
+ enum object_type type;
+ unsigned long size;
+ buf = read_sha1_file(sha, &type, &size);
+ if (!buf)
+ return 0;
+ if (type != OBJ_BLOB) {
+ free(buf);
+ return 0;
+ }
+ strbuf_attach(dst, buf, size, size + 1);
+ return 1;
+}
+
+static int normalized_eq(const unsigned char *a_sha,
+ const unsigned char *b_sha,
+ const char *path)
+{
+ struct strbuf a = STRBUF_INIT;
+ struct strbuf b = STRBUF_INIT;
+ int ret = 0;
+ if (a_sha && b_sha &&
+ read_sha1_strbuf(a_sha, path, &a) &&
+ read_sha1_strbuf(b_sha, path, &b)) {
+ /* Both files must be normalized, so we can't use || */
+ if ((renormalize_buffer(path, a.buf, a.len, &a) |
+ renormalize_buffer(path, b.buf, b.len, &b)) &&
+ (a.len == b.len))
+ ret = memcmp(a.buf, b.buf, a.len) == 0;
+ }
+ strbuf_release(&a);
+ strbuf_release(&b);
+ return ret;
+}
+
/* Per entry merge function */
static int process_entry(struct merge_options *o,
const char *path, struct stage_data *entry)
@@ -1075,8 +1113,10 @@ static int process_entry(struct merge_options *o,
if (o_sha && (!a_sha || !b_sha)) {
/* Case A: Deleted in one */
if ((!a_sha && !b_sha) ||
- (sha_eq(a_sha, o_sha) && !b_sha) ||
- (!a_sha && sha_eq(b_sha, o_sha))) {
+ (!b_sha && sha_eq(a_sha, o_sha) ||
+ normalized_eq(a_sha, o_sha, path)) ||
+ (!a_sha && sha_eq(b_sha, o_sha) ||
+ normalized_eq(b_sha, o_sha, path))) {
/* Deleted in both or deleted in one and
* unchanged in the other */
if (a_sha)
diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
index 44e6003..5e45256 100755
--- a/t/t6038-merge-text-auto.sh
+++ b/t/t6038-merge-text-auto.sh
@@ -45,7 +45,7 @@ test_expect_success 'Check merging addition of text=auto' '
test_cmp file file.temp
'
-test_expect_failure 'Test delete/normalize conflict' '
+test_expect_success 'Test delete/normalize conflict' '
git checkout side &&
git reset --hard initial &&
git rm file &&
--
1.7.1.575.g383de
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 3/3] Don't expand CRLFs when normalizing text during merge
2010-06-27 19:43 [PATCH v4 0/3] CRLF merge conflict reduction, take 4 Eyvind Bernhardsen
2010-06-27 19:43 ` [PATCH v4 1/3] Avoid conflicts when merging branches with mixed normalization Eyvind Bernhardsen
2010-06-27 19:43 ` [PATCH v4 2/3] Try normalizing files to avoid delete/modify conflicts when merging Eyvind Bernhardsen
@ 2010-06-27 19:43 ` Eyvind Bernhardsen
2 siblings, 0 replies; 14+ messages in thread
From: Eyvind Bernhardsen @ 2010-06-27 19:43 UTC (permalink / raw)
To: Junio C Hamano
Cc: Johannes Sixt, Finn Arne Gangstad, git@vger.kernel.org List
Disable CRLF expansion when convert_to_working_tree() is called from
normalize_buffer(). This improves performance when merging branches
with conflicting line endings when core.eol=crlf or core.autocrlf=true
by making the normalization act as if core.eol=lf.
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
---
convert.c | 27 ++++++++++++++++++++-------
1 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/convert.c b/convert.c
index 0203be8..01de9a8 100644
--- a/convert.c
+++ b/convert.c
@@ -741,7 +741,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
return ret | ident_to_git(path, src, len, dst, ident);
}
-int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst)
+static int convert_to_working_tree_internal(const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ int normalizing)
{
struct git_attr_check check[5];
enum action action = CRLF_GUESS;
@@ -767,18 +769,29 @@ int convert_to_working_tree(const char *path, const char *src, size_t len, struc
src = dst->buf;
len = dst->len;
}
- action = determine_action(action, eol_attr);
- ret |= crlf_to_worktree(path, src, len, dst, action);
- if (ret) {
- src = dst->buf;
- len = dst->len;
+ /*
+ * CRLF conversion can be skipped if normalizing, unless there
+ * is a smudge filter. The filter might expect CRLFs.
+ */
+ if (filter || !normalizing) {
+ action = determine_action(action, eol_attr);
+ ret |= crlf_to_worktree(path, src, len, dst, action);
+ if (ret) {
+ src = dst->buf;
+ len = dst->len;
+ }
}
return ret | apply_filter(path, src, len, dst, filter);
}
+int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst)
+{
+ return convert_to_working_tree_internal(path, src, len, dst, 0);
+}
+
int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst)
{
- int ret = convert_to_working_tree(path, src, len, dst);
+ int ret = convert_to_working_tree_internal(path, src, len, dst, 1);
if (ret) {
src = dst->buf;
len = dst->len;
--
1.7.1.575.g383de
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v4 1/3] Avoid conflicts when merging branches with mixed normalization
2010-06-27 19:43 ` [PATCH v4 1/3] Avoid conflicts when merging branches with mixed normalization Eyvind Bernhardsen
@ 2010-06-28 8:02 ` Finn Arne Gangstad
2010-06-28 19:32 ` [PATCH] Clarify text filter merge conflict reduction docs Eyvind Bernhardsen
0 siblings, 1 reply; 14+ messages in thread
From: Finn Arne Gangstad @ 2010-06-28 8:02 UTC (permalink / raw)
To: Eyvind Bernhardsen
Cc: Junio C Hamano, Johannes Sixt, git@vger.kernel.org List
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -317,6 +317,18 @@ command is "cat").
> smudge = cat
> ------------------------
>
> +For best results, `clean` and `smudge` commands should produce output
> +that is not dependent on the corresponding command having been run.
> +That is, `clean` should produce identical output whether its input has
> +been run through `smudge` or not, and `smudge` should not rely on its
> +input having been run through `clean`. See the section on merging
> +below for a rationale.
I think this is marginally unclear, what about:
Clean should not alter its output further if run again
clean(x) == clean(clean(x))
Smudge should not alter the output of clean
clean(x) == clean(smudge(x))
It should not matter that smudge will do something weird if clean
hasn't been run, as long as clean(x) == clean(smudge(x)) still
holds. I also think it is worth mentioning explicitly that clean can
be run multiple times, you should not have to infer this.
> [...]
> +Merging branches with differing checkin/checkout attributes
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Maybe something about when this happens, or even put it in the header
instead? Something like
If you have added attributes to a file that cause the canonical
repository format for that file to change, such as adding a
clean/smudge filter or text/eol/ident attributes, merging anything based
on a point in time where the attribute was not in place would normally
cause merge conflicts.
> +
> +To prevent unnecessary merge conflicts, git runs a virtual check-out
> +and check-in of all three stages of a file when resolving a three-way
> +merge. This prevents changes caused by check-in conversion from
> +causing spurious merge conflicts when a converted file is merged with
> +an unconverted file.
- Finn Arne
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] Clarify text filter merge conflict reduction docs
2010-06-28 8:02 ` Finn Arne Gangstad
@ 2010-06-28 19:32 ` Eyvind Bernhardsen
2010-06-28 20:31 ` Finn Arne Gangstad
2010-06-29 16:19 ` Junio C Hamano
0 siblings, 2 replies; 14+ messages in thread
From: Eyvind Bernhardsen @ 2010-06-28 19:32 UTC (permalink / raw)
To: Finn Arne Gangstad; +Cc: Junio C Hamano, git@vger.kernel.org List
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
---
How does this look?
This commit should really be squashed into the first one in the series.
Live and learn, next time I'll add new changes in a new commit and put
it _last_. I'll submit a fixed series once we're happy with the
documentation.
- Eyvind
Documentation/gitattributes.txt | 46 ++++++++++++++++++++++-----------------
1 files changed, 26 insertions(+), 20 deletions(-)
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index b110082..22400c1 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -317,17 +317,16 @@ command is "cat").
smudge = cat
------------------------
-For best results, `clean` and `smudge` commands should produce output
-that is not dependent on the corresponding command having been run.
-That is, `clean` should produce identical output whether its input has
-been run through `smudge` or not, and `smudge` should not rely on its
-input having been run through `clean`. See the section on merging
-below for a rationale.
+For best results, `clean` should not alter its output further if it is
+run twice ("clean->clean" should be equivalent to "clean"), and
+multiple `smudge` commands should not alter `clean`'s output
+("smudge->smudge->clean" should be equivalent to "clean"). See the
+section on merging below.
-The example "indent" filter is well-behaved in this regard: it will
-accept input that is already correctly indented without modifying it.
-In this case, the lack of a smudge filter means that the clean filter
-_must_ accept its own output without modifying it.
+The "indent" filter is well-behaved in this regard: it will not modify
+input that is already correctly indented. In this case, the lack of a
+smudge filter means that the clean filter _must_ accept its own output
+without modifying it.
Interaction between checkin/checkout attributes
@@ -346,16 +345,23 @@ with `text`, and then `ident` and fed to `filter`.
Merging branches with differing checkin/checkout attributes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-To prevent unnecessary merge conflicts, git runs a virtual check-out
-and check-in of all three stages of a file when resolving a three-way
-merge. This prevents changes caused by check-in conversion from
-causing spurious merge conflicts when a converted file is merged with
-an unconverted file.
-
-This strategy will break down if a `smudge` filter relies on its input
-having been processed by the corresponding `clean` filter or vice
-versa. Such filters may otherwise work well, but will prevent
-automatic merging.
+If you have added attributes to a file that cause the canonical
+repository format for that file to change, such as adding a
+clean/smudge filter or text/eol/ident attributes, merging anything
+where the attribute is not in place would normally cause merge
+conflicts.
+
+To prevent these unnecessary merge conflicts, git runs a virtual
+check-out and check-in of all three stages of a file when resolving a
+three-way merge. This prevents changes caused by check-in conversion
+from causing spurious merge conflicts when a converted file is merged
+with an unconverted file.
+
+As long as a "smudge->clean" results in the same output as a "clean"
+even on files that are already smudged, this strategy will
+automatically resolve all filter-related conflicts. Filters that do
+not act in this way may cause additional merge conflicts that must be
+resolved manually.
Generating diff text
--
1.7.1.575.g383de
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] Clarify text filter merge conflict reduction docs
2010-06-28 19:32 ` [PATCH] Clarify text filter merge conflict reduction docs Eyvind Bernhardsen
@ 2010-06-28 20:31 ` Finn Arne Gangstad
2010-06-29 16:19 ` Junio C Hamano
1 sibling, 0 replies; 14+ messages in thread
From: Finn Arne Gangstad @ 2010-06-28 20:31 UTC (permalink / raw)
To: Eyvind Bernhardsen; +Cc: Junio C Hamano, git@vger.kernel.org List
On Mon, Jun 28, 2010 at 09:32:50PM +0200, Eyvind Bernhardsen wrote:
> How does this look?
>
> This commit should really be squashed into the first one in the series.
> Live and learn, next time I'll add new changes in a new commit and put
> it _last_. I'll submit a fixed series once we're happy with the
> documentation.
Looks good I think!
- Finn Arne
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Clarify text filter merge conflict reduction docs
2010-06-28 19:32 ` [PATCH] Clarify text filter merge conflict reduction docs Eyvind Bernhardsen
2010-06-28 20:31 ` Finn Arne Gangstad
@ 2010-06-29 16:19 ` Junio C Hamano
2010-06-29 21:18 ` Eyvind Bernhardsen
2010-06-30 8:20 ` Eyvind Bernhardsen
1 sibling, 2 replies; 14+ messages in thread
From: Junio C Hamano @ 2010-06-29 16:19 UTC (permalink / raw)
To: Eyvind Bernhardsen; +Cc: Finn Arne Gangstad, git@vger.kernel.org List
Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> writes:
> Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
> ---
> How does this look?
Looks Ok (I didn't read _this_ patch but read a squashed-in result),
thanks.
> +If you have added attributes to a file that cause...
> +...To prevent these unnecessary merge conflicts,
This naturally calls for an optimization idea, doesn't it?
I wonder if ll_merge should gain another flag bit to disable the calls to
normalize_file(), so that the whole thing can be skipped when the caller
somehow knows .gitattributes files that govern the path didn't change.
That won't be a trivial optimization and my gut feeling is that it
shouldn't be part of this series.
I do however wonder if this should be initially introduced as an
experimental feature, guarded with a configuration option for brave souls
to try it out, and flip the feature on by default after we gain confidence
in it, both in performance and in correctness.
-- >8 --
Introduce "double conversion during merge" more gradually
This marks the recent improvement to the merge machinery that helps people
who changed their mind between CRLF/LF an opt in feature, so that we can
more easily release it early to everybody, without fear of breaking the
majority of users (read: on POSIX) that don't need it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Documentation/config.txt | 10 ++++++++++
Documentation/gitattributes.txt | 5 +++--
cache.h | 1 +
config.c | 5 +++++
environment.c | 1 +
ll-merge.c | 8 +++++---
6 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 4c49104..ad2a27e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -538,6 +538,16 @@ core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
linkgit:git-read-tree[1] for more information.
+core.doubleConvert::
+ Tell git that canonical representation of files in the repository
+ has changed over time (e.g. earlier commits record text files
+ with CRLF line endings, but recent ones use LF line endings). In
+ such a repository, git is forced to convert the data recorded in
+ commits twice before performing a merge to reduce unnecessary
+ conflicts. For more information, see section
+ "Merging branches with differing checkin/checkout attributes" in
+ linkgit:gitattributes[5].
+
add.ignore-errors::
Tells 'git add' to continue adding files when some files cannot be
added due to indexing errors. Equivalent to the '--ignore-errors'
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 22400c1..504d5ca 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -351,9 +351,10 @@ clean/smudge filter or text/eol/ident attributes, merging anything
where the attribute is not in place would normally cause merge
conflicts.
-To prevent these unnecessary merge conflicts, git runs a virtual
+To prevent these unnecessary merge conflicts, git can be told to run a virtual
check-out and check-in of all three stages of a file when resolving a
-three-way merge. This prevents changes caused by check-in conversion
+three-way merge by setting `core.doubleConvert` configuration variable.
+This prevents changes caused by check-in conversion
from causing spurious merge conflicts when a converted file is merged
with an unconverted file.
diff --git a/cache.h b/cache.h
index aa725b0..217f1e9 100644
--- a/cache.h
+++ b/cache.h
@@ -551,6 +551,7 @@ extern int read_replace_refs;
extern int fsync_object_files;
extern int core_preload_index;
extern int core_apply_sparse_checkout;
+extern int core_ll_merge_double_convert;
enum safe_crlf {
SAFE_CRLF_FALSE = 0,
diff --git a/config.c b/config.c
index cdcf583..bea054c 100644
--- a/config.c
+++ b/config.c
@@ -595,6 +595,11 @@ static int git_default_core_config(const char *var, const char *value)
return 0;
}
+ if (!strcmp(var, "core.doubleconvert")) {
+ core_ll_merge_double_convert = git_config_bool(var, value);
+ return 0;
+ }
+
/* Add other config variables here and to Documentation/config.txt. */
return 0;
}
diff --git a/environment.c b/environment.c
index 83d38d3..a8f04e7 100644
--- a/environment.c
+++ b/environment.c
@@ -53,6 +53,7 @@ enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
char *notes_ref_name;
int grafts_replace_parents = 1;
int core_apply_sparse_checkout;
+int core_ll_merge_double_convert;
/* Parallel index stat data preload? */
int core_preload_index = 0;
diff --git a/ll-merge.c b/ll-merge.c
index 28c6f54..8831631 100644
--- a/ll-merge.c
+++ b/ll-merge.c
@@ -344,9 +344,11 @@ int ll_merge(mmbuffer_t *result_buf,
const struct ll_merge_driver *driver;
int virtual_ancestor = flag & 01;
- normalize_file(ancestor, path);
- normalize_file(ours, path);
- normalize_file(theirs, path);
+ if (core_ll_merge_double_convert) {
+ normalize_file(ancestor, path);
+ normalize_file(ours, path);
+ normalize_file(theirs, path);
+ }
if (!git_path_check_merge(path, check)) {
ll_driver_name = check[0].value;
if (check[1].value) {
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] Clarify text filter merge conflict reduction docs
2010-06-29 16:19 ` Junio C Hamano
@ 2010-06-29 21:18 ` Eyvind Bernhardsen
2010-06-30 17:46 ` Junio C Hamano
2010-06-30 8:20 ` Eyvind Bernhardsen
1 sibling, 1 reply; 14+ messages in thread
From: Eyvind Bernhardsen @ 2010-06-29 21:18 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Finn Arne Gangstad, git@vger.kernel.org List
On 29. juni 2010, at 18.19, Junio C Hamano wrote:
> I do however wonder if this should be initially introduced as an
> experimental feature, guarded with a configuration option for brave souls
> to try it out, and flip the feature on by default after we gain confidence
> in it, both in performance and in correctness.
Agreed, and thanks. Messing with the merge machinery unnerves me a little.
Shouldn't the normalization in merge-recursive be conditional too?
Something like this squashed into the delete/modify patch:
--8<--
diff --git a/merge-recursive.c b/merge-recursive.c
index a2c174f..49bd3d2 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1079,6 +1079,8 @@ static int normalized_eq(const unsigned char *a_sha,
{
struct strbuf a = STRBUF_INIT;
struct strbuf b = STRBUF_INIT;
+ if (!core_ll_merge_double_convert)
+ return 0;
int ret = 0;
if (a_sha && b_sha &&
read_sha1_strbuf(a_sha, path, &a) &&
--8<--
Sorry about the whitespace-damaged inline diff, I'll redo the series if necessary.
--
Eyvind
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] Clarify text filter merge conflict reduction docs
2010-06-29 16:19 ` Junio C Hamano
2010-06-29 21:18 ` Eyvind Bernhardsen
@ 2010-06-30 8:20 ` Eyvind Bernhardsen
2010-06-30 15:15 ` Junio C Hamano
1 sibling, 1 reply; 14+ messages in thread
From: Eyvind Bernhardsen @ 2010-06-30 8:20 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Finn Arne Gangstad, git@vger.kernel.org List
On 29. juni 2010, at 18.19, Junio C Hamano wrote:
> Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> writes:
[...]
>> +If you have added attributes to a file that cause...
>> +...To prevent these unnecessary merge conflicts,
>
> This naturally calls for an optimization idea, doesn't it?
>
> I wonder if ll_merge should gain another flag bit to disable the calls to
> normalize_file(), so that the whole thing can be skipped when the caller
> somehow knows .gitattributes files that govern the path didn't change.
>
> That won't be a trivial optimization and my gut feeling is that it
> shouldn't be part of this series.
Are you thinking that we could check changes in .gitattributes during a merge and only turn on normalization for those files where relevant attributes have changed? I like it, but I agree with your gut, especially since normalization has to be enabled manually.
> I do however wonder if this should be initially introduced as an
> experimental feature, guarded with a configuration option for brave souls
> to try it out, and flip the feature on by default after we gain confidence
> in it, both in performance and in correctness.
My fix to add the configuration option to the delete/modify patch yesterday was pretty bad, sorry. My only excuse is that I was in a hurry, I'll resend the series tonight with a better fix.
--
Eyvind Bernhardsen
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Clarify text filter merge conflict reduction docs
2010-06-30 8:20 ` Eyvind Bernhardsen
@ 2010-06-30 15:15 ` Junio C Hamano
0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2010-06-30 15:15 UTC (permalink / raw)
To: Eyvind Bernhardsen; +Cc: Finn Arne Gangstad, git@vger.kernel.org List
Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> writes:
> Are you thinking that we could check changes in .gitattributes during a
> merge and only turn on normalization for those files where relevant
> attributes have changed?
Nothing that elaborate.
I was envisioning that we would compare object names of the .gitattributes
files in directories that lead to the path being merged in three trees,
and we use the new slowpath unless all three match. You could look _into_
the actual contents of .gitattributes and decide that a particular change
does not affect the path you are merging, but I don't think it is worth
it; sane people are expected not to flip CRLF/LF around many times a day
anyway, so changes to .gitattributes should already be rare events.
We will be walking the trees in parallel while merging anyway, so when you
have to merge a/b/c.txt, we would already have opened the top-level tree,
tree "a", and tree "a/b" already, and we should be able pick up the object
name of .gitattributes, a/.gitattributes and b/.gitattributes cheaply
without opening any extra object.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Clarify text filter merge conflict reduction docs
2010-06-29 21:18 ` Eyvind Bernhardsen
@ 2010-06-30 17:46 ` Junio C Hamano
2010-06-30 21:32 ` Eyvind Bernhardsen
0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2010-06-30 17:46 UTC (permalink / raw)
To: Eyvind Bernhardsen; +Cc: Finn Arne Gangstad, git@vger.kernel.org List
Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> writes:
> Shouldn't the normalization in merge-recursive be conditional too?
True, but your patch to merge-recursive is broken, I think. It should at
least look like the attached rewrite.
Key points:
- Your "normalized_eq()" is called only from the codepath that wants to
know if "one is deleted and the other is unchanged" to implement the
latter half of that check. Let's name that function blob_unchanged().
Also let's move the "how to compare two blobs when we are not doing
double conversion" (i.e. sha_eq()) logic from its caller to it. These
changes would make it a lot easier to read the caller.
- Your read_sha1_strbuf() took "path" as input but didn't do anything
with it. Let's drop it.
- It didn't diagnose any errors. Let's make sure that it follows the
usual "0 for success, negative for error" convention and make sure its
only caller knows about it.
- "blob_unchanged()" (aka "normalized_eq()") is called only when we have
two blobs to compare. It is a programming error to pass NULL in either
o_sha or a_sha; let's assert() it.
- Your error path in the function made the caller assume that two input
matched and it is Ok to continue; it should instead force the caller to
stop so that the end user can notice.
- Return values from functions in convert.c are NOT error signals. A
true value is to let the caller know that the callee had to convert
(and a false value is that the callee didn't convert), and the next
transformation needs to be done on the result in the strbuf (as opposed
to the src buffer that was left inact).
In your use of renormalize_buffer(), your input "src" points at the
same strbuf as your output "dst", so your caller does not care if
renormalize didn't have to do anything. Either way, you would pick the
result up from the strbuf.
But using the return value to skip the comparison as if the call failed
is _wrong_. Even if there is no need for conversion for the given path
(i.e. 0 return from convert.c functions), you would need to pick up the
result and perform the comparison.
If you had a test that made sure that the merge works for paths that do
not need double-conversion, you might have caught the last issue. I
suspect that your new tests _only_ checked what happens to paths that
actually triggers these double conversion, without making sure that the
new code would not affect the cases where it shouldn't be involved, no?
It is a common trap to fall into not testing the negative case, when you
are working on your own shiny new toy. Let's be more careful when writing
new tests.
Thanks.
merge-recursive.c | 45 ++++++++++++++++++++++++++++++++++++++++++-
t/t6038-merge-text-auto.sh | 2 +-
2 files changed, 44 insertions(+), 3 deletions(-)
diff --git a/merge-recursive.c b/merge-recursive.c
index 206c103..3c63e5e 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1056,6 +1056,47 @@ static unsigned char *stage_sha(const unsigned char *sha, unsigned mode)
return (is_null_sha1(sha) || mode == 0) ? NULL: (unsigned char *)sha;
}
+static int read_sha1_strbuf(const unsigned char *sha1, struct strbuf *dst)
+{
+ void *buf;
+ enum object_type type;
+ unsigned long size;
+ buf = read_sha1_file(sha1, &type, &size);
+ if (!buf)
+ return error("cannot read object %s", sha1_to_hex(sha1));
+ if (type != OBJ_BLOB) {
+ free(buf);
+ return error("object %s is not a blob", sha1_to_hex(sha1));
+ }
+ strbuf_attach(dst, buf, size, size + 1);
+ return 0;
+}
+
+static int blob_unchanged(const unsigned char *o_sha,
+ const unsigned char *a_sha,
+ const char *path)
+{
+ struct strbuf o = STRBUF_INIT;
+ struct strbuf a = STRBUF_INIT;
+ int ret;
+
+ if (!core_ll_merge_double_convert)
+ return sha_eq(o_sha, a_sha);
+
+ ret = 0; /* assume changed for safety */
+ assert(o_sha && a_sha);
+ if (read_sha1_strbuf(o_sha, &o) || read_sha1_strbuf(a_sha, &a))
+ goto error_return;
+ renormalize_buffer(path, o.buf, o.len, &o);
+ renormalize_buffer(path, a.buf, o.len, &a);
+ ret = (o.len == a.len && !memcmp(o.buf, a.buf, o.len));
+
+error_return:
+ strbuf_release(&o);
+ strbuf_release(&a);
+ return ret;
+}
+
/* Per entry merge function */
static int process_entry(struct merge_options *o,
const char *path, struct stage_data *entry)
@@ -1075,8 +1116,8 @@ static int process_entry(struct merge_options *o,
if (o_sha && (!a_sha || !b_sha)) {
/* Case A: Deleted in one */
if ((!a_sha && !b_sha) ||
- (sha_eq(a_sha, o_sha) && !b_sha) ||
- (!a_sha && sha_eq(b_sha, o_sha))) {
+ (!b_sha && blob_unchanged(o_sha, a_sha, path)) ||
+ (!a_sha && blob_unchanged(o_sha, b_sha, path))) {
/* Deleted in both or deleted in one and
* unchanged in the other */
if (a_sha)
diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
index 2caebb9..4a7bc48 100755
--- a/t/t6038-merge-text-auto.sh
+++ b/t/t6038-merge-text-auto.sh
@@ -46,7 +46,7 @@ test_expect_success 'Check merging addition of text=auto' '
test_cmp file file.temp
'
-test_expect_failure 'Test delete/normalize conflict' '
+test_expect_success 'Test delete/normalize conflict' '
git checkout side &&
git reset --hard initial &&
git rm file &&
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] Clarify text filter merge conflict reduction docs
2010-06-30 17:46 ` Junio C Hamano
@ 2010-06-30 21:32 ` Eyvind Bernhardsen
2010-07-01 3:33 ` Junio C Hamano
0 siblings, 1 reply; 14+ messages in thread
From: Eyvind Bernhardsen @ 2010-06-30 21:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Finn Arne Gangstad, git@vger.kernel.org List
Thanks for the detailed review and rewrite! I unexpectedly had to spend the evening working on non-git related stuff, so I haven't even had time to send my promised re-roll.
On 30. juni 2010, at 19.46, Junio C Hamano wrote:
> Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> writes:
>
>> Shouldn't the normalization in merge-recursive be conditional too?
>
> True, but your patch to merge-recursive is broken, I think. It should at
> least look like the attached rewrite.
Your patch is better than mine, but in my defense I think you misdiagnosed the big problems. The error path in normalized_eq returned 0 in case of problems, making the caller assume that the files differ and generate a merge conflict, and the return code from normalize_buffer was used correctly (see below).
[...]
> If you had a test that made sure that the merge works for paths that do
> not need double-conversion, you might have caught the last issue. I
> suspect that your new tests _only_ checked what happens to paths that
> actually triggers these double conversion, without making sure that the
> new code would not affect the cases where it shouldn't be involved, no?
The real tests I ran were a couple of huge merges, admittedly across a "text=auto" change, but most paths were not touched by either branch, so I would definitely have hit that issue.
> It is a common trap to fall into not testing the negative case, when you
> are working on your own shiny new toy. Let's be more careful when writing
> new tests.
Yes, absolutely. The tests are pretty threadbare for such a potentially dangerous change, so I'll cop to a "shiny new toy" charge. I'll also cop to writing hard-to-read code, but I still think it worked :)
Even though I like your version better than mine, I have a few comments, only some of which are defensive:
> + if (!core_ll_merge_double_convert)
> + return sha_eq(o_sha, a_sha);
I would rather do this:
if (sha_eq(o_sha, a_sha))
return 1;
if (!core_ll_merge_double_convert)
return 0;
If the sha1s are equal before normalization the files will be equal after normalization, so there's no sense in going through the rigmarole.
Bikeshed colour, I know, but core_ll_merge_double_convert is unwieldy and also a bit inaccurate since it's not just for ll_merge. How about core_merge_prefilter, with a corresponding change to the config setting? (I had this change as part of my unsent re-roll).
> + ret = 0; /* assume changed for safety */
> + assert(o_sha && a_sha);
> + if (read_sha1_strbuf(o_sha, &o) || read_sha1_strbuf(a_sha, &a))
> + goto error_return;
> + renormalize_buffer(path, o.buf, o.len, &o);
> + renormalize_buffer(path, a.buf, o.len, &a);
> + ret = (o.len == a.len && !memcmp(o.buf, a.buf, o.len));
This was one of your points, but I deliberately skipped the memcmp here if _neither_ of the renormalize_buffers did any work (binary "|" instead of boolean "||" to ensure both sides are evaluated, but the comment was perhaps a little obscure).
If the files had been equal without any normalization the sha_eq() would have caught it, so we know the files are different without having to compare them.
> +error_return:
> + strbuf_release(&o);
> + strbuf_release(&a);
> + return ret;
I'm not a goto objector, just curious: what's the advantage of "goto error_return" here vs. having the renormalizing code inside the if and inverting the test?
Thanks again for taking the time to improve my patch.
--
Eyvind
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Clarify text filter merge conflict reduction docs
2010-06-30 21:32 ` Eyvind Bernhardsen
@ 2010-07-01 3:33 ` Junio C Hamano
0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2010-07-01 3:33 UTC (permalink / raw)
To: Eyvind Bernhardsen
Cc: Junio C Hamano, Finn Arne Gangstad, git@vger.kernel.org List
Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> writes:
>> + if (!core_ll_merge_double_convert)
>> + return sha_eq(o_sha, a_sha);
>
> I would rather do this:
>
> if (sha_eq(o_sha, a_sha))
> return 1;
> if (!core_ll_merge_double_convert)
> return 0;
You are absolutely right. This part of my patch was an unnecessary
pessimization.
Will fix.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2010-07-01 3:33 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-27 19:43 [PATCH v4 0/3] CRLF merge conflict reduction, take 4 Eyvind Bernhardsen
2010-06-27 19:43 ` [PATCH v4 1/3] Avoid conflicts when merging branches with mixed normalization Eyvind Bernhardsen
2010-06-28 8:02 ` Finn Arne Gangstad
2010-06-28 19:32 ` [PATCH] Clarify text filter merge conflict reduction docs Eyvind Bernhardsen
2010-06-28 20:31 ` Finn Arne Gangstad
2010-06-29 16:19 ` Junio C Hamano
2010-06-29 21:18 ` Eyvind Bernhardsen
2010-06-30 17:46 ` Junio C Hamano
2010-06-30 21:32 ` Eyvind Bernhardsen
2010-07-01 3:33 ` Junio C Hamano
2010-06-30 8:20 ` Eyvind Bernhardsen
2010-06-30 15:15 ` Junio C Hamano
2010-06-27 19:43 ` [PATCH v4 2/3] Try normalizing files to avoid delete/modify conflicts when merging Eyvind Bernhardsen
2010-06-27 19:43 ` [PATCH v4 3/3] Don't expand CRLFs when normalizing text during merge Eyvind Bernhardsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).