git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Karthik Nayak <karthik.188@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [Bug?] Information around newlines lost in merge
Date: Tue, 20 Jun 2023 10:44:29 -0700	[thread overview]
Message-ID: <xmqqh6r2ni4i.fsf@gitster.g> (raw)
In-Reply-To: <CAOLa=ZRumrpfG8FxQQG=Q8tGvxEapMvOx6SZyRPh0GSpn5iuUg@mail.gmail.com> (Karthik Nayak's message of "Tue, 20 Jun 2023 11:22:23 +0200")

Karthik Nayak <karthik.188@gmail.com> writes:

> When merging two files which contain newlines at the end, the blob
> created (with conflicts) is the same as two files created without
> newlines at the end.
>
> If this is expected behavior, what would be the best way to
> differentiate the two? This is not a bug introduced, but rather the
> behavior since,
> the start, which makes me think that I'm missing something (verified
> via git bisect on latest git).

Strictly speaking, I suspect that the behaviour was different before
we introduced in-core 3-way merges of two blobs---back then we ran
the "merge" program (from the RCS suite).

If we start from an empty file and have two sides add different
incomplete lines (i.e. your "half" example, but without the leading
blank line), i.e.

	$ >O
	$ M="with a single line added by side %s (without terminating LF)"
	$ printf "$M" A >A
	$ printf "$M" B >B

The original "git merge" that used the external "merge" program
would have produced this:

	$ merge -p B O A 2>E
        <<<<<<< B
        with a single line added by side B (without terminating LF).=======
        with a single line added by side A (without terminating LF).>>>>>>> A
	$ cat E
	merge: warning: conflicts during merge

That is, the output would be a mess that cannot even be machine
parsed.  It probably has changed in a slightly improved way when we
switched to our own internal 3-way merge of two blobs, exposed as
the "git merge-file", which gives you:

        $ git merge-file -p A O B
        <<<<<<< A
        with a single line added by side A (without terminating LF).
        ||||||| O
        =======
        with a single line added by side B (without terminating LF).
        >>>>>>> B

And as you found out, if we added terminating LF to A and/or B, the
output would be the same.  You could argue that the result is at
least machine parseable, instead of the output that is more faithful
to the input (which we've seen above, in the output from "merge").

As "7 repeated marker characters followed by a random label string"
the merge machinery inserts cannot be relied on if you are building
a truly automated conflict resolver, lack of this one bit of
information each from both sides may be the least of your problems,
but what it means at the same time is that you _could_ propose an
augmented output format, perhaps like this one:

        $ git merge-file -p A O B
        <<<<<<< A
        with a single line added by side A (without terminating LF).
	\No newline at end of file
        ||||||| O
        =======
        with a single line added by side B (without terminating LF).
	\No newline at end of file
        >>>>>>> B

It has exactly the same problem we already have as these conflict
section separator lines in that lines that exactly would look like
these extra lines _could_ exist in the payload, so it is not
creating a new problem, but people may have built and are happy
enough with their incomplete automation that relies on the faulty
assumption that the merged payload would never contain lines that
are mistaken as conflict section separator lines, and such an
augmented output format _will_ be a breaking change to them.

So, I dunno.

  reply	other threads:[~2023-06-20 17:44 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-20  9:22 [Bug?] Information around newlines lost in merge Karthik Nayak
2023-06-20 17:44 ` Junio C Hamano [this message]
2023-06-21 11:41   ` Karthik Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqh6r2ni4i.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=karthik.188@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).