From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
Jonathan Tan <jonathantanmy@google.com>,
Taylor Blau <me@ttaylorr.com>, Junio C Hamano <gitster@pobox.com>,
Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>,
Derrick Stolee <stolee@gmail.com>,
Elijah Newren <newren@gmail.com>
Subject: [PATCH v4 0/6] Optimization batch 7: use file basenames to guide rename detection
Date: Thu, 11 Feb 2021 08:15:43 +0000 [thread overview]
Message-ID: <pull.843.v4.git.1613031350.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.843.v3.git.1612970140.gitgitgadget@gmail.com>
This series depends on ort-perf-batch-6[1].
This series uses file basenames (portion of the path after final '/',
including extension) in a basic fashion to guide rename detection.
Changes since v3:
* update documentation as suggested by Junio
* NEW: add another patch at the end, to simplify patch series that will be
submitted later (please review!)
[1] https://lore.kernel.org/git/xmqqlfc4byt6.fsf@gitster.c.googlers.com/
Elijah Newren (6):
t4001: add a test comparing basename similarity and content similarity
diffcore-rename: compute basenames of all source and dest candidates
diffcore-rename: complete find_basename_matches()
diffcore-rename: guide inexact rename detection based on basenames
gitdiffcore doc: mention new preliminary step for rename detection
merge-ort: call diffcore_rename() directly
Documentation/gitdiffcore.txt | 20 ++++
diffcore-rename.c | 202 +++++++++++++++++++++++++++++++++-
merge-ort.c | 66 +++++++++--
t/t4001-diff-rename.sh | 24 ++++
4 files changed, 301 insertions(+), 11 deletions(-)
base-commit: 7ae9460d3dba84122c2674b46e4339b9d42bdedd
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-843%2Fnewren%2Fort-perf-batch-7-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-843/newren/ort-perf-batch-7-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/843
Range-diff vs v3:
1: 3e6af929d135 = 1: 3e6af929d135 t4001: add a test comparing basename similarity and content similarity
2: 4fff9b1ff57b = 2: 4fff9b1ff57b diffcore-rename: compute basenames of all source and dest candidates
3: dc26881e4ed3 = 3: dc26881e4ed3 diffcore-rename: complete find_basename_matches()
4: 2493f4b2f55d = 4: 2493f4b2f55d diffcore-rename: guide inexact rename detection based on basenames
5: fc72d24a3358 ! 5: 4e86ed3f29d4 gitdiffcore doc: mention new preliminary step for rename detection
@@ Documentation/gitdiffcore.txt: a similarity score different from the default of
+deleted from a different directory, it will mark them as renames and
+exclude them from the later quadratic step (the one that pairwise
+compares all unmatched files to find the "best" matches, determined by
-+the highest content similarity). So, for example, if
-+docs/extensions.txt and docs/config/extensions.txt have similar
-+content, then they will be marked as a rename even if it turns out
-+that docs/extensions.txt was more similar to src/extension-checks.c.
-+At most, one comparison is done per file in this preliminary pass; so
-+if there are several extensions.txt files throughout the directory
-+hierarchy that were added and deleted, this preliminary step will be
-+skipped for those files.
++the highest content similarity). So, for example, if a deleted
++docs/ext.txt and an added docs/config/ext.txt are similar enough, they
++will be marked as a rename and prevent an added docs/ext.md that may
++be even more similar to the deleted docs/ext.txt from being considered
++as the rename destination in the later step. For this reason, the
++preliminary "match same filename" step uses a bit higher threshold to
++mark a file pair as a rename and stop considering other candidates for
++better matches. At most, one comparison is done per file in this
++preliminary pass; so if there are several ext.txt files throughout the
++directory hierarchy that were added and deleted, this preliminary step
++will be skipped for those files.
+
Note. When the "-C" option is used with `--find-copies-harder`
option, 'git diff-{asterisk}' commands feed unmodified filepairs to
-: ------------ > 6: fedb3d323d94 merge-ort: call diffcore_rename() directly
--
gitgitgadget
next prev parent reply other threads:[~2021-02-11 8:16 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-06 22:52 [PATCH 0/3] Optimization batch 7: use file basenames to guide rename detection Elijah Newren via GitGitGadget
2021-02-06 22:52 ` [PATCH 1/3] diffcore-rename: compute basenames of all source and dest candidates Elijah Newren via GitGitGadget
2021-02-06 22:52 ` [PATCH 2/3] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-06 22:52 ` [PATCH 3/3] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-07 14:38 ` Derrick Stolee
2021-02-07 19:51 ` Junio C Hamano
2021-02-08 8:38 ` Elijah Newren
2021-02-08 11:43 ` Derrick Stolee
2021-02-08 16:25 ` Elijah Newren
2021-02-08 17:37 ` Junio C Hamano
2021-02-08 22:00 ` Elijah Newren
2021-02-08 23:43 ` Junio C Hamano
2021-02-08 23:52 ` Elijah Newren
2021-02-08 8:27 ` Elijah Newren
2021-02-08 11:31 ` Derrick Stolee
2021-02-08 16:09 ` Elijah Newren
2021-02-07 5:19 ` [PATCH 0/3] Optimization batch 7: use file basenames to guide rename detection Junio C Hamano
2021-02-07 6:05 ` Elijah Newren
2021-02-09 11:32 ` [PATCH v2 0/4] " Elijah Newren via GitGitGadget
2021-02-09 11:32 ` [PATCH v2 1/4] diffcore-rename: compute basenames of all source and dest candidates Elijah Newren via GitGitGadget
2021-02-09 13:17 ` Derrick Stolee
2021-02-09 16:56 ` Elijah Newren
2021-02-09 17:02 ` Derrick Stolee
2021-02-09 17:42 ` Elijah Newren
2021-02-09 11:32 ` [PATCH v2 2/4] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-09 13:25 ` Derrick Stolee
2021-02-09 17:17 ` Elijah Newren
2021-02-09 17:34 ` Derrick Stolee
2021-02-09 11:32 ` [PATCH v2 3/4] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-09 13:33 ` Derrick Stolee
2021-02-09 17:41 ` Elijah Newren
2021-02-09 18:59 ` Junio C Hamano
2021-02-09 11:32 ` [PATCH v2 4/4] gitdiffcore doc: mention new preliminary step for rename detection Elijah Newren via GitGitGadget
2021-02-09 12:59 ` Derrick Stolee
2021-02-09 17:03 ` Junio C Hamano
2021-02-09 17:44 ` Elijah Newren
2021-02-10 15:15 ` [PATCH v3 0/5] Optimization batch 7: use file basenames to guide " Elijah Newren via GitGitGadget
2021-02-10 15:15 ` [PATCH v3 1/5] t4001: add a test comparing basename similarity and content similarity Elijah Newren via GitGitGadget
2021-02-13 1:15 ` Junio C Hamano
2021-02-13 4:50 ` Elijah Newren
2021-02-13 23:56 ` Junio C Hamano
2021-02-14 1:24 ` Elijah Newren
2021-02-14 1:32 ` Junio C Hamano
2021-02-14 3:14 ` Elijah Newren
2021-02-10 15:15 ` [PATCH v3 2/5] diffcore-rename: compute basenames of all source and dest candidates Elijah Newren via GitGitGadget
2021-02-13 1:32 ` Junio C Hamano
2021-02-10 15:15 ` [PATCH v3 3/5] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-13 1:48 ` Junio C Hamano
2021-02-13 18:34 ` Elijah Newren
2021-02-13 23:55 ` Junio C Hamano
2021-02-14 3:08 ` Elijah Newren
2021-02-10 15:15 ` [PATCH v3 4/5] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-13 1:49 ` Junio C Hamano
2021-02-10 15:15 ` [PATCH v3 5/5] gitdiffcore doc: mention new preliminary step for rename detection Elijah Newren via GitGitGadget
2021-02-10 16:41 ` Junio C Hamano
2021-02-10 17:20 ` Elijah Newren
2021-02-11 8:15 ` Elijah Newren via GitGitGadget [this message]
2021-02-11 8:15 ` [PATCH v4 1/6] t4001: add a test comparing basename similarity and content similarity Elijah Newren via GitGitGadget
2021-02-11 8:15 ` [PATCH v4 2/6] diffcore-rename: compute basenames of all source and dest candidates Elijah Newren via GitGitGadget
2021-02-11 8:15 ` [PATCH v4 3/6] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-11 8:15 ` [PATCH v4 4/6] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-11 8:15 ` [PATCH v4 5/6] gitdiffcore doc: mention new preliminary step for rename detection Elijah Newren via GitGitGadget
2021-02-11 8:15 ` [PATCH v4 6/6] merge-ort: call diffcore_rename() directly Elijah Newren via GitGitGadget
2021-02-13 1:53 ` [PATCH v4 0/6] Optimization batch 7: use file basenames to guide rename detection Junio C Hamano
2021-02-14 7:51 ` [PATCH v5 " Elijah Newren via GitGitGadget
2021-02-14 7:51 ` [PATCH v5 1/6] t4001: add a test comparing basename similarity and content similarity Elijah Newren via GitGitGadget
2021-02-14 7:51 ` [PATCH v5 2/6] diffcore-rename: compute basenames of source and dest candidates Elijah Newren via GitGitGadget
2021-02-14 7:51 ` [PATCH v5 3/6] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-14 7:51 ` [PATCH v5 4/6] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-14 7:51 ` [PATCH v5 5/6] gitdiffcore doc: mention new preliminary step for rename detection Elijah Newren via GitGitGadget
2021-02-14 7:51 ` [PATCH v5 6/6] merge-ort: call diffcore_rename() directly Elijah Newren via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.843.v4.git.1613031350.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.