From: Thomas Gummerer <t.gummerer@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
Thomas Gummerer <t.gummerer@gmail.com>
Subject: [PATCH v2 05/10] rerere: add some documentation
Date: Tue, 5 Jun 2018 22:52:14 +0100 [thread overview]
Message-ID: <20180605215219.28783-6-t.gummerer@gmail.com> (raw)
In-Reply-To: <20180605215219.28783-1-t.gummerer@gmail.com>
Add some documentation for the logic behind the conflict normalization
in rerere.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
Documentation/technical/rerere.txt | 142 +++++++++++++++++++++++++++++
rerere.c | 4 -
2 files changed, 142 insertions(+), 4 deletions(-)
create mode 100644 Documentation/technical/rerere.txt
diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
new file mode 100644
index 0000000000..2c517fe0fc
--- /dev/null
+++ b/Documentation/technical/rerere.txt
@@ -0,0 +1,142 @@
+Rerere
+======
+
+This document describes the rerere logic.
+
+Conflict normalization
+----------------------
+
+To ensure recorded conflict resolutions can be looked up in the rerere
+database, even when branches are merged in a different order,
+different branches are merged that result in the same conflict, or
+when different conflict style settings are used, rerere normalizes the
+conflicts before writing them to the rerere database.
+
+Differnt conflict styles and branch names are dealt with by stripping
+that information from the conflict markers, and removing extraneous
+information from the `diff3` conflict style.
+
+Branches being merged in different order are dealt with by sorting the
+conflict hunks. More on each of those parts in the following
+sections.
+
+Once these two normalization operations are applied, a conflict ID is
+created based on the normalized conflict, which is later used by
+rerere to look up the conflict in the rerere database.
+
+Stripping extraneous information
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Say we have three branches AB, AC and AC2. The common ancestor of
+these branches has a file with with a line with the string "A" (for
+brevity this line is called "line A" for brevity in the following) in
+it. In branch AB this line is changed to "B", in AC, this line is
+changed to C, and branch AC2 is forked off of AC, after the line was
+changed to C.
+
+Now forking a branch ABAC off of branch AB and then merging AC into it,
+we'd get a conflict like the following:
+
+ <<<<<<< HEAD
+ B
+ =======
+ C
+ >>>>>>> AC
+
+Now doing the analogous with AC2 (forking a branch ABAC2 off of branch
+AB and then merging branch AC2 into it), maybe using the diff3
+conflict style, we'd get a conflict like the following:
+
+ <<<<<<< HEAD
+ B
+ ||||||| merged common ancestors
+ A
+ =======
+ C
+ >>>>>>> AC2
+
+By resolving this conflict, to leave line D, the user declares:
+
+ After examining what branches AB and AC did, I believe that making
+ line A into line D is the best thing to do that is compatible with
+ what AB and AC wanted to do.
+
+As branch AC2 refers to the same commit as AC, the above implies that
+this is also compatible what AB and AC2 wanted to do.
+
+By extension, this means that rerere should recognize that the above
+conflicts are the same. To do this, the labels on the conflict
+markers are stripped, and the diff3 output is removed. The above
+examples would both result in the following normalized conflict:
+
+ <<<<<<<
+ B
+ =======
+ C
+ >>>>>>>
+
+Sorting hunks
+~~~~~~~~~~~~~
+
+As before, lets imagine that a common ancestor had a file with line A
+its early part, and line X in its late part. And then four branches
+are forked that do these things:
+
+ - AB: changes A to B
+ - AC: changes A to C
+ - XY: changes X to Y
+ - XZ: changes X to Z
+
+Now, forking a branch ABAC off of branch AB and then merging AC into
+it, and forking a branch ACAB off of branch AC and then merging AB
+into it, would yield the conflict in a different order. The former
+would say "A became B or C, what now?" while the latter would say "A
+became C or B, what now?"
+
+As a reminder, the act of merging AC into ABAC and resolving the
+conflict to leave line D means that the user declares:
+
+ After examining what branches AB and AC did, I believe that
+ making line A into line D is the best thing to do that is
+ compatible with what AB and AC wanted to do.
+
+So the conflict we would see when merging AB into ACAB should be
+resolved the same way---it is the resolution that is in line with that
+declaration.
+
+Imagine that similarly previously a branch XYXZ was forked from XY,
+and XZ was merged into it, and resolved "X became Y or Z" into "X
+became W".
+
+Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
+would have line B in its early part and line Y in its later part.
+Such a merge would be quite clean. We can construct 4 combinations
+using these four branches ((AB, AC) x (XY, XZ)).
+
+Merging ABXY and ACXZ would make "an early A became B or C, a late X
+became Y or Z" conflict, while merging ACXY and ABXZ would make "an
+early A became C or B, a late X became Y or Z". We can see there are
+4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
+
+By sorting, the conflict is given its canonical name, namely, "an
+early part became B or C, a late part becames X or Y", and whenever
+any of these four patterns appear, and we can get to the same conflict
+and resolution that we saw earlier.
+
+Without the sorting, we'd have to somehow find a previous resolution
+from combinatorial explosion.
+
+Conflict ID calculation
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Once the conflict normalization is done, the conflict ID is calculated
+as the sha1 hash of the conflict hunks appended to each other,
+separated by <NUL> characters. The conflict markers are stripped out
+before the sha1 is calculated. So in the example above, where we
+merge branch AC which changes line A to line C, into branch AB, which
+changes line A to line C, the conflict ID would be
+SHA1('B<NUL>C<NUL>').
+
+If there are multiple conflicts in one file, the sha1 is calculated
+the same way with all hunks appended to each other, in the order in
+which they appear in the file, separated by a <NUL> character.
diff --git a/rerere.c b/rerere.c
index 74ce422634..ef23abe4dd 100644
--- a/rerere.c
+++ b/rerere.c
@@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
* and NUL concatenated together.
*
* Return the number of conflict hunks found.
- *
- * NEEDSWORK: the logic and theory of operation behind this conflict
- * normalization may deserve to be documented somewhere, perhaps in
- * Documentation/technical/rerere.txt.
*/
static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
{
--
2.17.0.410.g65aef3a6c4
next prev parent reply other threads:[~2018-06-05 20:48 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
2018-05-20 21:12 ` [RFC/PATCH 1/7] rerere: unify error message when read_cache fails Thomas Gummerer
2018-05-21 19:00 ` Stefan Beller
2018-05-20 21:12 ` [RFC/PATCH 2/7] rerere: mark strings for translation Thomas Gummerer
2018-05-24 7:20 ` Junio C Hamano
2018-05-20 21:12 ` [RFC/PATCH 3/7] rerere: add some documentation Thomas Gummerer
2018-05-24 9:20 ` Junio C Hamano
2018-06-03 11:41 ` Thomas Gummerer
2018-05-20 21:12 ` [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved Thomas Gummerer
2018-05-24 9:47 ` Junio C Hamano
2018-05-24 18:54 ` Thomas Gummerer
2018-05-25 1:20 ` Junio C Hamano
2018-05-20 21:12 ` [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-05-24 10:02 ` Junio C Hamano
2018-05-20 21:12 ` [RFC/PATCH 6/7] rerere: factor out handle_conflict function Thomas Gummerer
2018-05-20 21:12 ` [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-05-24 10:21 ` Junio C Hamano
2018-05-24 19:07 ` Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 01/10] rerere: unify error messages when read_cache fails Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 02/10] rerere: lowercase error messages Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 03/10] rerere: wrap paths in output in sq Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 04/10] rerere: mark strings for translation Thomas Gummerer
2018-06-05 21:52 ` Thomas Gummerer [this message]
2018-06-05 21:52 ` [PATCH v2 06/10] rerere: fix crash when conflict goes unresolved Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 07/10] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 08/10] rerere: factor out handle_conflict function Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 09/10] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 10/10] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
2018-07-03 21:05 ` [PATCH v2 00/10] rerere: handle nested conflicts Thomas Gummerer
2018-07-06 17:56 ` Junio C Hamano
2018-07-10 21:37 ` Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 00/11] " Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 02/11] rerere: lowercase error messages Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 03/11] rerere: wrap paths in output in sq Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 04/11] rerere: mark strings for translation Thomas Gummerer
2018-07-15 13:24 ` Simon Ruderich
2018-07-16 20:40 ` Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 05/11] rerere: add documentation for conflict normalization Thomas Gummerer
2018-07-30 17:50 ` Junio C Hamano
2018-07-30 20:21 ` Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 06/11] rerere: fix crash when conflict goes unresolved Thomas Gummerer
2018-07-30 17:50 ` Junio C Hamano
2018-07-30 20:45 ` Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 07/11] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-07-30 17:50 ` Junio C Hamano
2018-07-30 20:47 ` Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 08/11] rerere: factor out handle_conflict function Thomas Gummerer
2018-07-30 17:51 ` Junio C Hamano
2018-07-14 21:44 ` [PATCH v3 09/11] rerere: return strbuf from handle path Thomas Gummerer
2018-07-30 17:51 ` Junio C Hamano
2018-07-14 21:44 ` [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-07-30 17:45 ` Junio C Hamano
2018-07-30 20:20 ` Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 11/11] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
2018-07-30 17:50 ` [PATCH v3 00/11] rerere: handle nested conflicts Junio C Hamano
2018-07-30 20:49 ` Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 " Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 02/11] rerere: lowercase error messages Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 03/11] rerere: wrap paths in output in sq Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 04/11] rerere: mark strings for translation Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 05/11] rerere: add documentation for conflict normalization Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 06/11] rerere: fix crash with files rerere can't handle Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 07/11] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 08/11] rerere: factor out handle_conflict function Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 09/11] rerere: return strbuf from handle path Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-08-22 11:00 ` Ævar Arnfjörð Bjarmason
2018-08-22 16:06 ` Junio C Hamano
2018-08-22 20:34 ` Thomas Gummerer
2018-08-22 21:07 ` Junio C Hamano
2018-08-24 21:56 ` Thomas Gummerer
2018-08-24 22:10 ` [PATCH 1/2] rerere: remove documentation for "nested conflicts" Thomas Gummerer
2018-08-24 22:10 ` [PATCH 2/2] rerere: add not about files with existing conflict markers Thomas Gummerer
2018-08-28 21:27 ` [PATCH v2 1/2] rerere: mention caveat about unmatched " Thomas Gummerer
2018-08-28 21:27 ` [PATCH v2 2/2] rerere: add note about files with existing " Thomas Gummerer
2018-08-29 16:04 ` [PATCH v2 1/2] rerere: mention caveat about unmatched " Junio C Hamano
2018-09-01 9:00 ` Thomas Gummerer
2018-08-27 17:33 ` [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts Junio C Hamano
2018-08-28 22:05 ` Thomas Gummerer
2018-08-27 19:36 ` Junio C Hamano
2018-08-05 17:20 ` [PATCH v4 11/11] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180605215219.28783-6-t.gummerer@gmail.com \
--to=t.gummerer@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.