git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How to turn off rename detection for cherry-pick?
@ 2024-08-26 11:09 Pavel Rappo
  2024-08-29  8:47 ` Pavel Rappo
  0 siblings, 1 reply; 10+ messages in thread
From: Pavel Rappo @ 2024-08-26 11:09 UTC (permalink / raw)
  To: Git mailing list

As far as I understand, the "ort" strategy does not allow turning off
rename detection. The best you can do is set the similarity index
threshold to 100%. However, it does not address the case of candidate
files being exactly the same.

So, it seems to me that the only way to do it would be to downgrade to
the "recursive" strategy and set the "no-renames" option:

    git cherry-pick --strategy=recursive --strategy-option=no-renames
<commit>...

Is my understanding correct? Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to turn off rename detection for cherry-pick?
  2024-08-26 11:09 How to turn off rename detection for cherry-pick? Pavel Rappo
@ 2024-08-29  8:47 ` Pavel Rappo
  2024-08-29 21:43   ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Pavel Rappo @ 2024-08-29  8:47 UTC (permalink / raw)
  To: Git mailing list

The reason I ask this is that we've run into a (probably practically
rare) case where cherry-pick changes a wrong file. We want to be able
to detect such cases.

Distilled from the actual repo, here is the case. Create a repo first:

    mkdir cherry-pick-carefully
    cd cherry-pick-carefully
    git init -b main .
    cat << EOF > x.txt
    1
    2
    3
    4
    5
    EOF
    git add x.txt
    git commit -m "Add x.txt"
    git checkout -b feature
    git rm x.txt
    git add x.txt
    git commit -m "Delete x.txt"
    cat << EOF > y.txt
    1
    2
    3
    4
    EOF
    git add y.txt
    git commit -m "Add y.txt, similar but not equal"
    cat << EOF > y.txt
    1
    2
    4
    EOF
    git add y.txt
    git commit -m "Slightly change y.txt"
    git checkout main

Now try to cherry-pick feature's head commit onto main:

    cd cherry-pick-carefully
    git cherry-pick feature

With a default git configuration, this should (surprisingly to many)
change x.txt instead of y.txt, which is not what the user would
expect.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to turn off rename detection for cherry-pick?
  2024-08-29  8:47 ` Pavel Rappo
@ 2024-08-29 21:43   ` Jeff King
  2024-08-29 23:12     ` Pavel Rappo
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2024-08-29 21:43 UTC (permalink / raw)
  To: Pavel Rappo; +Cc: Elijah Newren, Git mailing list

On Thu, Aug 29, 2024 at 09:47:52AM +0100, Pavel Rappo wrote:

> The reason I ask this is that we've run into a (probably practically
> rare) case where cherry-pick changes a wrong file. We want to be able
> to detect such cases.

You can pass merge strategy options on the command line. The old
"recursive" strategy has a "no-renames" option, so:

  git cherry-pick --strategy=recursive -Xno-renames feature

generates a modify/delete conflict for your example. Curiously, the
modern default, "ort", does not seem to respect that option. You can
bump up the limit to require exact renames, though, which does prevent
the mismerge in your case. Like:

  git cherry-pick -Xfind-renames=100% feature

There are also other strategies that do not do rename detection, but I
think you are better off using one of the more commonly-used strategies
and just disabling renames. IMHO it's a bug that ort doesn't respect
-Xno-renames.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to turn off rename detection for cherry-pick?
  2024-08-29 21:43   ` Jeff King
@ 2024-08-29 23:12     ` Pavel Rappo
  2024-08-30  0:31       ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Pavel Rappo @ 2024-08-29 23:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren, Git mailing list

You seem to have confirmed my understanding that I described in my
initial email (you replied to my second email in this thread).

On Thu, Aug 29, 2024 at 10:43 PM Jeff King <peff@peff.net> wrote:
>
> On Thu, Aug 29, 2024 at 09:47:52AM +0100, Pavel Rappo wrote:
>
> > The reason I ask this is that we've run into a (probably practically
> > rare) case where cherry-pick changes a wrong file. We want to be able
> > to detect such cases.
>
> You can pass merge strategy options on the command line. The old
> "recursive" strategy has a "no-renames" option, so:
>
>   git cherry-pick --strategy=recursive -Xno-renames feature
>
> generates a modify/delete conflict for your example. Curiously, the
> modern default, "ort", does not seem to respect that option. You can
> bump up the limit to require exact renames, though, which does prevent
> the mismerge in your case. Like:
>
>   git cherry-pick -Xfind-renames=100% feature
>
> There are also other strategies that do not do rename detection, but I
> think you are better off using one of the more commonly-used strategies
> and just disabling renames. IMHO it's a bug that ort doesn't respect
> -Xno-renames.
>
> -Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to turn off rename detection for cherry-pick?
  2024-08-29 23:12     ` Pavel Rappo
@ 2024-08-30  0:31       ` Jeff King
  2024-08-30 10:35         ` Pavel Rappo
                           ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jeff King @ 2024-08-30  0:31 UTC (permalink / raw)
  To: Pavel Rappo; +Cc: Elijah Newren, Git mailing list

On Fri, Aug 30, 2024 at 12:12:07AM +0100, Pavel Rappo wrote:

> You seem to have confirmed my understanding that I described in my
> initial email (you replied to my second email in this thread).

Heh, I did not even see the first message in the thread. But since we
independently arrived at the same conclusions, I guess we can consider
everything there accurate. :)

I do think it's a bug that ort doesn't respect -Xno-renames. The fix is
probably something like this:
 
diff --git a/merge-ort.c b/merge-ort.c
index 3752c7e595..94b3ce734c 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -3338,7 +3338,7 @@ static int detect_regular_renames(struct merge_options *opt,
 	repo_diff_setup(opt->repo, &diff_opts);
 	diff_opts.flags.recursive = 1;
 	diff_opts.flags.rename_empty = 0;
-	diff_opts.detect_rename = DIFF_DETECT_RENAME;
+	diff_opts.detect_rename = opts->detect_rename;
 	diff_opts.rename_limit = opt->rename_limit;
 	if (opt->rename_limit <= 0)
 		diff_opts.rename_limit = 7000;

though I'm not sure how DIFF_DETECT_COPIES should be handled
("recursive" silently downgrades it to DIFF_DETECT_RENAME).

I've added ort's author to the thread, so hopefully he should have a
more clueful response.

-Peff

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: How to turn off rename detection for cherry-pick?
  2024-08-30  0:31       ` Jeff King
@ 2024-08-30 10:35         ` Pavel Rappo
  2024-08-30 14:33         ` Elijah Newren
  2024-08-30 16:14         ` Junio C Hamano
  2 siblings, 0 replies; 10+ messages in thread
From: Pavel Rappo @ 2024-08-30 10:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren, Git mailing list

On Fri, Aug 30, 2024 at 1:31 AM Jeff King <peff@peff.net> wrote:
>
> <snip>
>
> I've added ort's author to the thread, so hopefully he should have a
> more clueful response.
>

Thanks!

Some time ago I created a gist which might better summarise the issue:
https://gist.github.com/pavelrappo/3b1cd89b6015ab0eade1b11876d563ff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to turn off rename detection for cherry-pick?
  2024-08-30  0:31       ` Jeff King
  2024-08-30 10:35         ` Pavel Rappo
@ 2024-08-30 14:33         ` Elijah Newren
  2024-08-30 16:49           ` Junio C Hamano
  2024-08-30 16:14         ` Junio C Hamano
  2 siblings, 1 reply; 10+ messages in thread
From: Elijah Newren @ 2024-08-30 14:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Pavel Rappo, Git mailing list

On Thu, Aug 29, 2024 at 5:31 PM Jeff King <peff@peff.net> wrote:
>
> On Fri, Aug 30, 2024 at 12:12:07AM +0100, Pavel Rappo wrote:
>
> > You seem to have confirmed my understanding that I described in my
> > initial email (you replied to my second email in this thread).
>
> Heh, I did not even see the first message in the thread. But since we
> independently arrived at the same conclusions, I guess we can consider
> everything there accurate. :)
>
> I do think it's a bug that ort doesn't respect -Xno-renames.

I'm kind of splitting hairs, but doesn't "bug" imply it was
overlooked?  Documentation/merge-strategies.txt makes it clear it
wasn't.  ;-)

However, I agree that it makes sense to start supporting.  I didn't at
first because (a) I could find no evidence at the time that anyone
actually ever used the option in conjunction with merges for
behavioral reasons (this thread from Pavel is the first counterexample
I've seen), and (b) there was lots of evidence that the related config
option was in widespread use as a workaround to the unnecessary
performance problems of rename handling with the recursive merge
algorithm.  In particular, I wanted to hear about any performance
issues with renames in merges, and it was far easier to make the new
(then-experimental) algorithm just ignore these options than attempt
to do widespread convincing of folks to switch the relevant config
back on.  I think that a few years has given us more than ample time
to hear about potential remaining performance issues with renames in
merges, and this thread is a good example of why to support this
option.

> The fix is probably something like this:

The fix probably starts with something like this...

> diff --git a/merge-ort.c b/merge-ort.c
> index 3752c7e595..94b3ce734c 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -3338,7 +3338,7 @@ static int detect_regular_renames(struct merge_options *opt,
>         repo_diff_setup(opt->repo, &diff_opts);
>         diff_opts.flags.recursive = 1;
>         diff_opts.flags.rename_empty = 0;
> -       diff_opts.detect_rename = DIFF_DETECT_RENAME;
> +       diff_opts.detect_rename = opts->detect_rename;
>         diff_opts.rename_limit = opt->rename_limit;
>         if (opt->rename_limit <= 0)
>                 diff_opts.rename_limit = 7000;
>
> though I'm not sure how DIFF_DETECT_COPIES should be handled
> ("recursive" silently downgrades it to DIFF_DETECT_RENAME).

Not downgrading DIFF_DETECT_COPIES for merging would break both
"recursive" and "ort" in all kinds of interesting ways.  I once spent
a little time thinking how copy detection might affect things (long
before working on ort), and noted that under such a scenario,
"recursive" would have multiple logic bugs (mostly in the form of
missing additional needed logic rather than existing logic being
incorrect, but in places that aren't necessarily obvious at first).
Someone would have to very carefully audit the whole file containing
either recursive or ort's algorithms if they wanted to make either
somehow support copy detection.  "ort" would be even more problematic
than "recursive" for such a case -- I took full advantage of the
differences between rename detection and copy detection while
optimizing ort and I think it was intrinsic to every one of the major
optimizations I did.  So, if you really wanted to support copy
detection in ort, the very first step would be adding code to turn off
every single one of its major optimizations (including tree-level
merging, which didn't sound like a rename or copy detection like
thing, but hinged on how rename detection works to make sure it didn't
miss important halfs of renames).

But, the bigger issue with copies is how exactly could a merge
algorithm use them in any way that would make any logical sense?  What
are you going to do, take the modifications that one side of history
made to some source file, and apply those modifications to all the
copies of the file that the other side of history made?  That sounds
crazy and counter-intuitive to me.  (...and incidentally, like a
factory for creating all kinds of crazy corner case issues; we could
probably make things even messier than the mod6 testcases in the
testsuite.)

> I've added ort's author to the thread, so hopefully he should have a
> more clueful response.

Yeah, not so straightforward.  Even with the downgrading of copy
detection to rename detection in this area (like "recursive" does),
this isn't sufficient.  IIRC (I looked at this about 2 years ago or so
when some others asked off-list), at least one of the optimizations
managed to bake in an assumption about having gone through the rename
detection codepaths.  As best I remember, when you attempt to turn
rename detection off, it triggers an assertion failure in a
non-obvious place far removed from the original issue, and I only ever
got a slightly hacky workaround or two for the issue.  I didn't have
good motivation or rationale to pursue very far at the time, though,
so after some hours of looking at it, I just decided to move on to
something else.

Anyway, I'll add support for no-renames to my list of things needed
before we can delete merge-recursive.[ch] and make requests for
"recursive" just map to "ort".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to turn off rename detection for cherry-pick?
  2024-08-30  0:31       ` Jeff King
  2024-08-30 10:35         ` Pavel Rappo
  2024-08-30 14:33         ` Elijah Newren
@ 2024-08-30 16:14         ` Junio C Hamano
  2024-08-30 17:17           ` Junio C Hamano
  2 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2024-08-30 16:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Pavel Rappo, Elijah Newren, Git mailing list

Jeff King <peff@peff.net> writes:

> though I'm not sure how DIFF_DETECT_COPIES should be handled
> ("recursive" silently downgrades it to DIFF_DETECT_RENAME).

I somehow suspect that we would want to go there to avoid
overloading ourselves (and end-users) with conceptual complexity.

If we detect that the file the side branch modified was copied to
create multiple files on our side (with or without modification on
our end), where would we want to reapply their changes to?  To the
original file only?  To all copies?  The same story goes for the
opposite direction.  How should possibly different changes they made
to their copies be merged back to the sole original file we have
(which we may or may not have modified)?  What if the both sides
made copies of the same file, and now we need to shuffle NxM
combination of changes?  I would expect it would be a whole can of
worms, which we _may_ be able to define concrete rules how we would
handle each of these cases but we may have a hard time to explain in
simple terms to end-users.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to turn off rename detection for cherry-pick?
  2024-08-30 14:33         ` Elijah Newren
@ 2024-08-30 16:49           ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2024-08-30 16:49 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jeff King, Pavel Rappo, Git mailing list

Elijah Newren <newren@gmail.com> writes:

> Anyway, I'll add support for no-renames to my list of things needed
> before we can delete merge-recursive.[ch] and make requests for
> "recursive" just map to "ort".

Sounds sensible.  Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to turn off rename detection for cherry-pick?
  2024-08-30 16:14         ` Junio C Hamano
@ 2024-08-30 17:17           ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2024-08-30 17:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Pavel Rappo, Elijah Newren, Git mailing list

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> though I'm not sure how DIFF_DETECT_COPIES should be handled
>> ("recursive" silently downgrades it to DIFF_DETECT_RENAME).
>
> I somehow suspect that we would want to go there to avoid
> overloading ourselves (and end-users) with conceptual complexity.

Of course, we would *NOT* want to go there X-<.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-08-30 17:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-26 11:09 How to turn off rename detection for cherry-pick? Pavel Rappo
2024-08-29  8:47 ` Pavel Rappo
2024-08-29 21:43   ` Jeff King
2024-08-29 23:12     ` Pavel Rappo
2024-08-30  0:31       ` Jeff King
2024-08-30 10:35         ` Pavel Rappo
2024-08-30 14:33         ` Elijah Newren
2024-08-30 16:49           ` Junio C Hamano
2024-08-30 16:14         ` Junio C Hamano
2024-08-30 17:17           ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).