From: "René Scharfe" <l.s.r@web.de>
To: Elijah Newren <newren@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Laďa Tesařík" <lada.tesarik@olc.cz>,
"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Lost file after git merge
Date: Sat, 30 Jul 2022 16:44:50 +0200 [thread overview]
Message-ID: <d461718f-cc72-96e2-4de6-4cc67e3b95a5@web.de> (raw)
In-Reply-To: <CABPp-BE4saKAboS=SPQmQe6n2=Fnhv7pL4_JfF2Zwg5Zhp7Vjw@mail.gmail.com>
Am 30.07.22 um 04:16 schrieb Elijah Newren:
> On Fri, Jul 29, 2022 at 1:34 PM René Scharfe <l.s.r@web.de> wrote:
>>
>> Am 28.07.22 um 19:11 schrieb Junio C Hamano:
>>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>>
>>>> On Thu, Jul 28 2022, Laďa Tesařík wrote:
>>>>
>>>>> 1. I added a file called 'new_file' to a master branch.
>>>>> 2. Then I created branch feature/2 and deleted the file in master
>>>>> 3. Then I deleted the file in branch feature/2 as well.
>>>>> 4. I created 'new_file' on branch feature/2 again.
>>>
>>> It heavily depends on how this creation is done, i.e. what went into
>>> the created file. Imagine that a file existed with content A at
>>> commit 0, both commits 1 and 2 removed it on their forked history,
>>> and then commit 3 added exactly the same content A to the same path:
>>>
>>> 1---3
>>> / \
>>> ----0---2---4---->
>>>
>>> When you are about to merge 2 and 3 to create 4, what would a
>>> three-way merge see?
>>>
>>> 0 had content A at path P
>>> 2 said "no we do not want content A at path P"
>>> 3 said "we are happy with content A at path P"
>>>
>>> So the net result is that 0-->3 "one side did not touch A at P" and
>>> 0-->2 "one side removed A at P".
>>>
>>> Three-way merge between X and Y is all about taking what X did if Y
>>> didn't have any opinion on what X touched. This is exactly that
>>> case. The history 0--->3 didn't have any opinion on what should be
>>> in P or whether P should exist, and that is why there is no change
>>> between these two endpoints.
>>
>> The last sentence is not necessarily true. You could also say that
>> 0--->3 cared so much about path P having content A that it brought it
>> back from the void. Determining whether a de-facto revert
>> - intended to return to an uncaring state of "take whatever main has" or
>> - meant to choose *that* specific content which incidentally is on main
>> is not possible from the snapshots at the merge point alone, I think.
>>
>> Checking if 0...3 touched P and leaving that path unmerged out of
>> caution shouldn't be terribly expensive.
>
> I think it might be terribly expensive.
>
> Walking history can easily be the slow part of such an operation, e.g.
> can_fast_forward() taking roughly 100 times as long as doing the
> merge_incore_recursive() portion that creates the new merged toplevel
> tree[1]. (And can_fast_forward() is a form of history walk that
> doesn't involve traversing into any trees, so I suspect it's a cheaper
> history traversal than what is being suggested).
>
> Focusing on the tree traversal side, this suggested change would
> essentially disable the trivial directory resolution optimizations in
> merge-ort[2]. (Note that the trivial directory resolution sped up a
> rebase that didn't involve very many renames by a factor of 25). The
> whole point of that optimization was to avoid walking into trees that
> were only changed on one side, where possible. Your proposed change
> would be saying we always have to walk into trees that either side
> modified...and do so for every intermediate commit as well so that we
> can fully enumerate all (temporarily) changed files.
True: Compared to just checking if a path was touched by 3, a history
traversal can take arbitrarily long. At least it's bounded by the merge
base and a specific path. And renames complicate the picture, but only
full renames (same blob or tree ID) need to be considered. That feels
doable in a reasonable amount of time, but it's not as cheap as ignoring
the history.
Assuming that one side doesn't care about a path because it has the same
content as the merge base is tempting. And reverts that break this
assumption are probably quite rare. Still it led to an unintended
outcome here. Reminds me of a recent chess robot incident [3]. Speed
is nice and safety has a cost, but do we already make the best possible
tradeoff here?
> [1] https://lore.kernel.org/git/CABPp-BE48=97k_3tnNqXPjSEfA163F8hoE+HY0Zvz1SWB2B8EA@mail.gmail.com/
> [2] https://lore.kernel.org/git/pull.988.v4.git.1626841444.gitgitgadget@gmail.com/
[3] https://www.theguardian.com/sport/2022/jul/24/chess-robot-grabs-and-breaks-finger-of-seven-year-old-opponent-moscow
next prev parent reply other threads:[~2022-07-30 14:45 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-28 8:23 Lost file after git merge Laďa Tesařík
2022-07-28 12:17 ` Ævar Arnfjörð Bjarmason
2022-07-28 17:11 ` Junio C Hamano
2022-07-29 20:23 ` René Scharfe
2022-07-29 22:04 ` Junio C Hamano
2022-07-30 2:16 ` Elijah Newren
2022-07-30 14:44 ` René Scharfe [this message]
2022-07-31 1:45 ` Elijah Newren
2022-07-28 21:23 ` brian m. carlson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d461718f-cc72-96e2-4de6-4cc67e3b95a5@web.de \
--to=l.s.r@web.de \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=lada.tesarik@olc.cz \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).