git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug?] Information around newlines lost in merge
@ 2023-06-20  9:22 Karthik Nayak
  2023-06-20 17:44 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Karthik Nayak @ 2023-06-20  9:22 UTC (permalink / raw)
  To: git

When merging two files which contain newlines at the end, the blob
created (with conflicts) is the
same as two files created without newlines at the end.

We can demonstrate the same using this simple script which uses
`git-merge-tree(1)`. The script
creates two files in master, newBranch and then merges the tree. The
created files seem to have
the same content, eventhough the source files differed by newlines.

#!/usr/bin/env bash

git init
git commit --allow-empty -m "base commit"

git branch newBranch

echo -ne "\nA\n" > newline
echo -ne "\nA" > half
git add .
git commit -m "master commit"

git checkout newBranch

echo -ne "\nB\n" > newline
echo -ne "\nB" > half
git add .
git commit -m "branch commit"

treeOID=$(git merge-tree master newBranch | head -n1)
halfOID=$(git cat-file -p $treeOID | grep half | awk '{print $3}')
newlineOID=$(git cat-file -p $treeOID | grep newline | awk '{print $3}')

if [[ $halfOID == $newlineOID ]]; then
     exit 1
else
     exit 0
fi

Is this expected behavior? Shouldn't the two merged files (newline,
half) differ? i.e., shouldn't the file (newline) with newlines at the
end
contain an extra newline compared to the file without the newline? (half).

If this is expected behavior, what would be the best way to
differentiate the two? This is not a bug introduced, but rather the
behavior since,
the start, which makes me think that I'm missing something (verified
via git bisect on latest git).

The context of this is that it is hard to programmatically resolve
conflicts without loosing information, since we don't know if there
should
be a newline suffix in the file or not.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Bug?] Information around newlines lost in merge
  2023-06-20  9:22 [Bug?] Information around newlines lost in merge Karthik Nayak
@ 2023-06-20 17:44 ` Junio C Hamano
  2023-06-21 11:41   ` Karthik Nayak
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2023-06-20 17:44 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git

Karthik Nayak <karthik.188@gmail.com> writes:

> When merging two files which contain newlines at the end, the blob
> created (with conflicts) is the same as two files created without
> newlines at the end.
>
> If this is expected behavior, what would be the best way to
> differentiate the two? This is not a bug introduced, but rather the
> behavior since,
> the start, which makes me think that I'm missing something (verified
> via git bisect on latest git).

Strictly speaking, I suspect that the behaviour was different before
we introduced in-core 3-way merges of two blobs---back then we ran
the "merge" program (from the RCS suite).

If we start from an empty file and have two sides add different
incomplete lines (i.e. your "half" example, but without the leading
blank line), i.e.

	$ >O
	$ M="with a single line added by side %s (without terminating LF)"
	$ printf "$M" A >A
	$ printf "$M" B >B

The original "git merge" that used the external "merge" program
would have produced this:

	$ merge -p B O A 2>E
        <<<<<<< B
        with a single line added by side B (without terminating LF).=======
        with a single line added by side A (without terminating LF).>>>>>>> A
	$ cat E
	merge: warning: conflicts during merge

That is, the output would be a mess that cannot even be machine
parsed.  It probably has changed in a slightly improved way when we
switched to our own internal 3-way merge of two blobs, exposed as
the "git merge-file", which gives you:

        $ git merge-file -p A O B
        <<<<<<< A
        with a single line added by side A (without terminating LF).
        ||||||| O
        =======
        with a single line added by side B (without terminating LF).
        >>>>>>> B

And as you found out, if we added terminating LF to A and/or B, the
output would be the same.  You could argue that the result is at
least machine parseable, instead of the output that is more faithful
to the input (which we've seen above, in the output from "merge").

As "7 repeated marker characters followed by a random label string"
the merge machinery inserts cannot be relied on if you are building
a truly automated conflict resolver, lack of this one bit of
information each from both sides may be the least of your problems,
but what it means at the same time is that you _could_ propose an
augmented output format, perhaps like this one:

        $ git merge-file -p A O B
        <<<<<<< A
        with a single line added by side A (without terminating LF).
	\No newline at end of file
        ||||||| O
        =======
        with a single line added by side B (without terminating LF).
	\No newline at end of file
        >>>>>>> B

It has exactly the same problem we already have as these conflict
section separator lines in that lines that exactly would look like
these extra lines _could_ exist in the payload, so it is not
creating a new problem, but people may have built and are happy
enough with their incomplete automation that relies on the faulty
assumption that the merged payload would never contain lines that
are mistaken as conflict section separator lines, and such an
augmented output format _will_ be a breaking change to them.

So, I dunno.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Bug?] Information around newlines lost in merge
  2023-06-20 17:44 ` Junio C Hamano
@ 2023-06-21 11:41   ` Karthik Nayak
  0 siblings, 0 replies; 3+ messages in thread
From: Karthik Nayak @ 2023-06-21 11:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Jun 20, 2023 at 7:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>
>         $ git merge-file -p A O B
>         <<<<<<< A
>         with a single line added by side A (without terminating LF).
>         \No newline at end of file
>         ||||||| O
>         =======
>         with a single line added by side B (without terminating LF).
>         \No newline at end of file
>         >>>>>>> B
>
> It has exactly the same problem we already have as these conflict
> section separator lines in that lines that exactly would look like
> these extra lines _could_ exist in the payload, so it is not
> creating a new problem, but people may have built and are happy
> enough with their incomplete automation that relies on the faulty
> assumption that the merged payload would never contain lines that
> are mistaken as conflict section separator lines, and such an
> augmented output format _will_ be a breaking change to them.
>
> So, I dunno.

Thank you for taking the time and responding. I'm wondering if there is merit
in modifying the merge algo.

Perhaps something where files merged with terminating LF would retain it.
So merging "A\n" and "B\n" would produce
"<<<<<<< master\nA=======\nB\n>>>>>>> newBranch\n\n", whereas
files being merged without a terminating LF wouldn't, so merging "A"
and "B" would
produce "<<<<<<< master\nA=======\nB\n>>>>>>> newBranch\n". Which
would make it easier to parse.

But overall, I get what you're saying. I will drop it here as expected behavior!

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-06-21 11:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-20  9:22 [Bug?] Information around newlines lost in merge Karthik Nayak
2023-06-20 17:44 ` Junio C Hamano
2023-06-21 11:41   ` Karthik Nayak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).