All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Taylor Blau <me@ttaylorr.com>, Elijah Newren <newren@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: [PATCH] merge-ort: avoid assuming all renames detected
Date: Sat, 15 Jan 2022 02:09:26 +0000	[thread overview]
Message-ID: <pull.1194.git.git.1642212566346.gitgitgadget@gmail.com> (raw)

From: Elijah Newren <newren@gmail.com>

In commit 8b09a900a1 ("merge-ort: restart merge with cached renames to
reduce process entry cost", 2021-07-16), we noted that in the merge-ort
steps of
    collect_merge_info()
    detect_and_process_renames()
    process_entries()
that process_entries() was expensive, and we could often make it cheaper
by changing this to
    collect_merge_info()
    detect_and_process_renames()
    <cache all the renames, and restart>
    collect_merge_info()
    detect_and_process_renames()
    process_entries()
because the second collect_merge_info() would be cheaper (we could avoid
traversing into some directories), the second
detect_and_process_renames() would be free since we had already detected
all renames, and then process_entries() has far fewer entries to handle.

However, this was built on the assumption that the first
detect_and_process_renames() actually detected all potential renames.
If someone has merge.renameLimit set to some small value, that
assumption is violated which manifests later with the following message:

    $ git -c merge.renameLimit=1 rebase upstream
    ...
    git: merge-ort.c:546: clear_or_reinit_internal_opts: Assertion
    `renames->cached_pairs_valid_side == 0' failed.

Turn off this cache-renames-and-restart whenever we cannot detect all
renames, and add a testcase that would have caught this problem.

Reported-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
    merge-ort: avoid assuming all renames detected
    
    Fixes https://lore.kernel.org/git/YeHTIfEutLYM4TIU@nand.local/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1194%2Fnewren%2Favoid-assertion-assuming-renames-found-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1194/newren/avoid-assertion-assuming-renames-found-v1
Pull-Request: https://github.com/git/git/pull/1194

 merge-ort.c                              |  4 ++
 t/t6429-merge-sequence-rename-caching.sh | 67 ++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index c3197970219..143d274f117 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -3060,6 +3060,10 @@ static int detect_and_process_renames(struct merge_options *opt,
 	trace2_region_enter("merge", "regular renames", opt->repo);
 	detection_run |= detect_regular_renames(opt, MERGE_SIDE1);
 	detection_run |= detect_regular_renames(opt, MERGE_SIDE2);
+	if (renames->needed_limit != 0) {
+		renames->cached_pairs_valid_side = 0;
+		renames->redo_after_renames = 0;
+	}
 	if (renames->redo_after_renames && detection_run) {
 		int i, side;
 		struct diff_filepair *p;
diff --git a/t/t6429-merge-sequence-rename-caching.sh b/t/t6429-merge-sequence-rename-caching.sh
index 035edc40b1e..f2bc8a7d2a2 100755
--- a/t/t6429-merge-sequence-rename-caching.sh
+++ b/t/t6429-merge-sequence-rename-caching.sh
@@ -697,4 +697,71 @@ test_expect_success 'caching renames only on upstream side, part 2' '
 	)
 '
 
+#
+# The following testcase just creates two simple renames (slightly modified
+# on both sides but without conflicting changes), and a directory full of
+# files that are otherwise uninteresting.  The setup is as follows:
+#
+#   base:     unrelated/<BUNCH OF FILES>
+#             numbers
+#             values
+#   upstream: modify: numbers
+#             modify: values
+#   topic:    add: unrelated/foo
+#             modify: numbers
+#             modify: values
+#             rename: numbers -> sequence
+#             rename: values -> progression
+#
+# This is a trivial rename case, but we're curious what happens with a very
+# low renameLimit interacting with the restart optimization trying to notice
+# that unrelated/ looks like a trivial merge candidate.
+#
+test_expect_success 'avoid assuming we detected renames' '
+	git init redo-weirdness &&
+	(
+		cd redo-weirdness &&
+
+		mkdir unrelated &&
+		for i in $(test_seq 1 10)
+		do
+			>unrelated/$i
+		done &&
+		test_seq  2 10 >numbers &&
+		test_seq 12 20 >values &&
+		git add numbers values unrelated/ &&
+		git commit -m orig &&
+
+		git branch upstream &&
+		git branch topic &&
+
+		git switch upstream &&
+		test_seq  1 10 >numbers &&
+		test_seq 11 20 >values &&
+		git add numbers &&
+		git commit -m "Some tweaks" &&
+
+		git switch topic &&
+
+		>unrelated/foo &&
+		test_seq  2 12 >numbers &&
+		test_seq 12 22 >values &&
+		git add numbers values unrelated/ &&
+		git mv numbers sequence &&
+		git mv values progression &&
+		git commit -m A &&
+
+		#
+		# Actual testing
+		#
+
+		git switch --detach topic^0 &&
+
+		test_must_fail git -c merge.renameLimit=1 rebase upstream &&
+
+		git ls-files -u >actual &&
+		! test_file_is_empty actual
+	)
+'
+
 test_done

base-commit: 1ffcbaa1a5f10c9f706314d77f88de20a4a498c2
-- 
gitgitgadget

             reply	other threads:[~2022-01-15  2:09 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-15  2:09 Elijah Newren via GitGitGadget [this message]
2022-01-16 21:46 ` [PATCH] merge-ort: avoid assuming all renames detected Junio C Hamano
2022-01-16 23:19 ` Junio C Hamano
2022-01-17 18:25 ` [PATCH v2] " Elijah Newren via GitGitGadget
2022-01-17 19:33   ` Junio C Hamano
2022-01-17 21:21     ` Elijah Newren
2022-01-17 22:07       ` Taylor Blau
2022-01-17 22:23         ` Junio C Hamano
2022-01-17 22:16       ` Junio C Hamano
2022-06-30  9:53   ` SZEDER Gábor
2022-07-01  2:30     ` Elijah Newren
2022-07-01  5:21       ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1194.git.git.1642212566346.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.