* Replaying merges
@ 2024-05-18 0:35 Johannes Schindelin
2024-05-18 1:45 ` Elijah Newren
0 siblings, 1 reply; 3+ messages in thread
From: Johannes Schindelin @ 2024-05-18 0:35 UTC (permalink / raw)
To: Elijah Newren; +Cc: git
Hi Elijah,
I took the suggestion to heart that you explained a couple of times to me:
To replay merge commits (including their merge conflict resolutions) by
using the _remerged_ commit as merge base, the original merge commit as
merge head, and the newly-created merge (with conflicts and all) as HEAD.
I noodled on this idea a bit until I got it into a usable shape that I
applied to great effect when working on the recent embargoed releases.
Here it is, the script [*1*] that I used (basically replacing all the
`merge -C` instances in the rebase script with `replay-merge.sh`):
-- snip --
#!/bin/sh
die () {
echo "$*" >&2
exit 1
}
test $# = 2 ||
die "Usage: $0 <original-merge> <rewritten-merge-head>"
original_merge="$(git rev-parse --verify "$1")" ||
die "Not a revision? $1"
test ' ' = "$(git show -s --format=%P "$original_merge" | tr -dc ' ')" ||
die "Not a merge? $1"
rewritten_merge_head="$(git rev-parse --verify "$2" 2>/dev/null)" ||
rewritten_merge_head="$(git rev-parse --verify "refs/rewritten/$2")" ||
die "Not a revision? $2"
# Already merged?
if test 0 -eq $(git rev-list --count HEAD..$rewritten_merge_head)
then
echo "Already merged: $2" >&2
exit 0
fi
# Can we fast-forward instead?
if test "$(git rev-parse HEAD $rewritten_merge_head)" = "$(git rev-parse $original_merge^ $original_merge^2)"
then
echo "Fast-forwarding to $1" >&2
exec git merge --no-stat --ff-only $original_merge
die "Could not fast-forward to $original_merge"
fi
# Only Git v2.45 and newer can handle the `--merge-base=<tree>` invocation
validate_git_version () {
empty_tree=4b825dc642cb6eb9a060e54bf8d69288fbee4904
git merge-tree --merge-base=$empty_tree $empty_tree $empty_tree >/dev/null 2>&1 ||
die "Need a Git version that understands --merge-base=<tree-ish>"
}
validate_git_version
do_merge () {
git update-ref refs/tmp/head $1 &&
git update-ref refs/tmp/merge_head $2 &&
{ result="$(git merge-tree refs/tmp/head refs/tmp/merge_head)"; res=$?; } &&
echo "$result" | head -n 1 &&
return $res
}
remerge_original=$(do_merge $original_merge^ $original_merge^2)
test -n "$remerge_original" || die "Could not remerge $original_merge"
merge_new=$(do_merge HEAD $rewritten_merge_head)
test -n "$merge_new" || die "Could not merge $rewritten_merge_head"
new_tree=$(git merge-tree --merge-base=$remerge_original $original_merge $merge_new | head -n 1)
test -n "$new_tree" || die "Could not create new merge"
# Even though there might be merge conflicts, the `merge-tree` command might
# succeed with exit code 0! The reason is that the merge conflict may originate
# from one of the previous two merges.
files_with_conflicts="$(git diff $original_merge..$new_tree |
sed -ne '/^diff --git /{
# store the first file name in the hold area
s/^diff --git a\/\(.*\) b\/.*$/\1/
x
}' -e '/^+<<<<<<< refs\/tmp\/head$/{
# found a merge conflict
:1
# read all lines until the ==== line
n
/^+=======$/b2
b1
:2
# read all lines until the >>>> line
/+>>>>>>> refs\/tmp\/merge_head$/{
# print file name
x
p
# skip to next file
:3
n
/^diff --git/{
# store the first file name in the hold area
s/^diff --git a\/\(.*\) b\/.*$/\1/
x
b
}
b3
}
n
b2
}')"
# Is it a "Sync with <version>" merge? Then regenerate the log
sync_info="$(git cat-file commit $original_merge |
sed -n '/^$/{N;s/^\n//;/^Sync with 2\./{N;N;s/^\(.*\)\n\n\* \([^:]*\).*/\1,\2/p};q}')"
merge_msg=
if test -n "$sync_info"
then
merge_msg="$(printf '%s\t\t%s\n' $rewritten_merge_head "${sync_info#*,}" |
git fmt-merge-msg --log -m "${sync_info%,*}" |
grep -v '^#')"
fi
if test -z "$files_with_conflicts"
then
# No conflicts
committer="$(git var GIT_COMMITTER_IDENT)" ||
die "Could not get committer ident"
new_commit="$(git cat-file commit "$original_merge")" ||
die "Could not get commit message of $original_merge"
new_commit="$(echo "$new_commit" |
sed '1,/^$/{
s/^tree .*/tree '"$new_tree"'/
s/^committer .*/committer '"$committer"'/
/^parent /{
:1
N
s/.*\n//
/^parent /b1
i\
parent '"$(git rev-parse HEAD)"'\
parent '"$(git rev-parse $rewritten_merge_head)"'
}
}')"
if test -n "$merge_msg"
then
new_commit="$(printf '%s\n\n%s\n' \
"$(echo "$new_commit" | sed '/^$/q')" \
"$merge_msg")"
fi
new_commit="$(echo "$new_commit" | git hash-object -t commit -w --stdin)" ||
die "Could not transmogrify commit object"
git merge --no-stat -q --ff-only "$new_commit"
else
echo "no-ff" >"$(git rev-parse --git-path MERGE_MODE)"
git rev-parse "$rewritten_merge_head" >"$(git rev-parse --git-path MERGE_HEAD)"
if test -n "$merge_msg"
then
echo "$merge_msg"
else
git cat-file commit "$original_merge" |
sed '1,/^$/d'
fi >"$(git rev-parse --git-path MERGE_MSG)"
git read-tree -u --reset "$new_tree" ||
die "Could not update to $new_tree"
echo "$files_with_conflicts" |
while read file
do
echo "Needs merge: $file"
mode="$(git ls-tree $new_tree "$file" | sed 's/ .*//')" &&
a=$(git show "$new_tree:$file" |
sed -e '/^<<<<<<< refs\/tmp\/head$/d' \
-e '/^=======$/,/>>>>>>> refs\/tmp\/merge_head$/d' |
git hash-object -w --stdin) &&
b=$(git show "$new_tree:$file" |
sed -e '/^<<<<<<< refs\/tmp\/head$/,/^=======$/d' \
-e '/>>>>>>> refs\/tmp\/merge_head$/d' |
git hash-object -w --stdin) &&
printf "%s %s %s\t%s\n" \
0 $a 0 "$file" \
$mode $(git rev-parse HEAD:"$file") 1 "$file" \
$mode $a 2 "$file" \
$mode $b 3 "$file" |
git update-index --index-info ||
die "Could not update the index with '$file'"
done
die "There were merge conflicts"
fi
-- snap --
For the most part, this worked beautifully.
However. The devil lies in the detail. You will see that the majority of
the script is concerned with recreating the stages that need to be put
into the index. The reason is that the merge conflicts are already part of
the merge base and hence the `merge-tree` arguments do not reflect the
stages.
But it gets even worse. The biggest complication is not even addressed in
this script, when I realized what was going on, I understood immediately
that it was time to abandon the shell script and start implementing this
logic in C (which I can currently only do on my own time, which is
scarce). The biggest complication being the scenario... when a merge
conflict had been addressed in the original merge commit, but in the
replayed merge there is no conflict. In such a scenario, this script _will
create not one, but two merge conflicts, nested ones_!
I still do think that your idea has merit, but I fear that it won't ever
be as easy as performing multiple three-way merges in succession. To
address the observed problem, the code will always have to be aware of
unresolved conflicts in the provided merge base, so that it can handle
them appropriately, and not treat them as plain text, so that no nested
conflicts need to be created.
Unfortunately, I did not document properly in what precise circumstances
those nested conflicts were generated (I was kind of busy trying to
coordinate everything around the security bug-fix releases), but I hope to
find some time soon to do so, and to turn them into a set of test cases
that we can play with.
Ciao,
Johannes
Footnote *1*: You'd think that I'd learn from past experiences _not_ to
prototype in Bash when I want to eventually implement it in C. Honestly, I
thought I could get away with it because I failed to anticipate the many
complications, not the least of which being that there is currently no
_actually_ correct way to generate the stages. So basically I thought that
the script would consist of the part before the code comment starting with
"Even though there might be merge conflicts"...
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Replaying merges
2024-05-18 0:35 Replaying merges Johannes Schindelin
@ 2024-05-18 1:45 ` Elijah Newren
[not found] ` <CANiSa6gyNpJ3cUNLD1hFnBYeDFm6aFYv8k41MGvX+C90G8oaaw@mail.gmail.com>
0 siblings, 1 reply; 3+ messages in thread
From: Elijah Newren @ 2024-05-18 1:45 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
Hi Johannes!
On Fri, May 17, 2024 at 5:35 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Elijah,
>
> I took the suggestion to heart that you explained a couple of times to me:
> To replay merge commits (including their merge conflict resolutions) by
> using the _remerged_ commit as merge base, the original merge commit as
> merge head, and the newly-created merge (with conflicts and all) as HEAD.
>
> I noodled on this idea a bit until I got it into a usable shape that I
> applied to great effect when working on the recent embargoed releases.
>
> Here it is, the script [*1*] that I used (basically replacing all the
> `merge -C` instances in the rebase script with `replay-merge.sh`):
>
<snip>
> For the most part, this worked beautifully.
Cool to see someone try it out.
> However. The devil lies in the detail.
Yup, but details rather than detail. ;-)
<snip>
> The biggest complication being the scenario... when a merge
> conflict had been addressed in the original merge commit, but in the
> replayed merge there is no conflict. In such a scenario, this script _will
> create not one, but two merge conflicts, nested ones_!
Only if merge.conflictStyle="diff3"; if merge.conflictStyle="merge",
then there will be no nested conflict (since the nested conflict comes
from the fact that the base version had a conflict itself).
This is one of the issues I noted in my write up a couple years ago:
https://github.com/newren/git/blob/replay/replay-design-notes.txt#L315-L316
Further, it can get worse, since in the current code the inner
conflict from the base merge could be an already arbitrarily nested
merge conflict with N levels (due to recursive merging allowing
arbitrary nested of merge conflicts), giving us an overall nesting of
N+1 merge conflicts rather than just the 2 you assumed. That's ugly
enough, but we also need to worry about ensuring the conflict markers
from different merges get different conflict marker lengths, which
presents an extra challenge since the outer merge here is not part of
the original recursive merge.
In addition to these challenges, there's some other ones:
* What about when the remerged commit and the newly-created merge
have the "same" conflict. Does it actually look the "same" to the
diff machinery so that it can resolve the conflict away to how the
original merge resolved? (Answer: not with a naive merge of these
three commits; we need to do some extra tweaking. I'm actually
suprised you said this basic idea worked given this particular
problem.)
* What about conflicts with binary files? Or non-textual conflicts
of other types like modify/delete or rename/rename?
> I still do think that your idea has merit, but I fear that it won't ever
> be as easy as performing multiple three-way merges in succession.
I totally agree we need to do more than the simple merge of those
three "commits"; I have ideas for this that address some of the
challenges over at
https://github.com/newren/git/blob/replay/replay-design-notes.txt#L264-L341
> To address the observed problem, the code will always have to be aware of
> unresolved conflicts in the provided merge base, so that it can handle
> them appropriately, and not treat them as plain text, so that no nested
> conflicts need to be created.
I agree we need to handle conflicts specially -- not only in the
provided merge base ('R' in my document) but also in the new merge of
the two parents (what you labelled HEAD and I labelled 'N').
> Unfortunately, I did not document properly in what precise circumstances
> those nested conflicts were generated (I was kind of busy trying to
> coordinate everything around the security bug-fix releases), but I hope to
> find some time soon to do so, and to turn them into a set of test cases
> that we can play with.
Yeah, we'll also need to add testcases for some of the other issues I
point out in that document.
I'm looking forward to my situation changing soon and hopefully
getting more time to work on things like this...
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-05-18 17:50 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-18 0:35 Replaying merges Johannes Schindelin
2024-05-18 1:45 ` Elijah Newren
[not found] ` <CANiSa6gyNpJ3cUNLD1hFnBYeDFm6aFYv8k41MGvX+C90G8oaaw@mail.gmail.com>
2024-05-18 17:50 ` Martin von Zweigbergk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).