git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Git merge conflicts and encoding of logs
@ 2008-12-23  4:48 Junichi Uekawa
  2008-12-23  8:22 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Junichi Uekawa @ 2008-12-23  4:48 UTC (permalink / raw)
  To: git

Hi,


Git merge conflict will insert '<<< first line of commit log message'
'===' '>>>' markers to the text file that is causing a conflict.

Unfortunately, the encoding of the text file may be different from the
log message encoding, and that results in a file which has a mixed
encoding (which is pretty hard to edit from any editor BTW).

My use case is editing platex files (iso-2022-jp encoded) with log
messages of utf-8.

http://git.debian.org/?p=tokyodebian/monthly-report.git;a=summary




... Thinking about it, it's probably the same encoding problem as git
blame.  Is there already a good way to fix this or is this something
that needs fixing?


regards,
	junichi
-- 
dancer@{netfort.gr.jp,debian.org}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Git merge conflicts and encoding of logs
  2008-12-23  4:48 Git merge conflicts and encoding of logs Junichi Uekawa
@ 2008-12-23  8:22 ` Junio C Hamano
  2008-12-23  8:41   ` Johannes Sixt
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2008-12-23  8:22 UTC (permalink / raw)
  To: Junichi Uekawa; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2267 bytes --]

Junichi Uekawa <dancer@netfort.gr.jp> writes:

> Git merge conflict will insert '<<< first line of commit log message'
> '===' '>>>' markers to the text file that is causing a conflict.
>
> Unfortunately, the encoding of the text file may be different from the
> log message encoding, and that results in a file which has a mixed
> encoding (which is pretty hard to edit from any editor BTW).
>
> My use case is editing platex files (iso-2022-jp encoded) with log
> messages of utf-8.
>
> ... Thinking about it, it's probably the same encoding problem as git
> blame.

What 69cd8f6 (builtin-blame: Reencode commit messages according to git-log
rules., 2008-10-22) does to git-blame is to re-encode the data taken from
the commit log to i18n.logoutputencoding, and put that in the datastream.

If your commit object have names and messages in utf-8, and if you set
i18n.logoutputencoding to iso-2022-jp, that would reencode data taken from
the commit object in iso-2022-jp and sprinkle them in the blame
datastream.

The issue would be certainly similar, *if* anything on your <<</===/>>>
lines came from commit log message, but I couldn't trigger what you
describe.  I prepared a history of this shape:

   B
  /
 o---A

with ISO-2022-JP payload and UTF-8 commit log message.  Then, I added:

        [i18n]
                logoutputencoding = iso-2022-jp

which lets me read "git log -p --all" quite comfortably.  Everything comes
out as good old JISX0208.  So far, so good.

Then while on branch B, I tried to merge A, which resulted in conflicts
that looked like this:

<<<<<<< HEAD:foo
これはサイドブランチの変更です。
やはり JIS コードで書いてます。
=======
日本語のファイルです。
JIS コードで書いてます。
>>>>>>> master:foo

The above will probably come out as UTF-8 in this mail text, but the point
is that the confict side markers do not have anything but filename and the
branch name.  I am still scratching my head trying to see where in the
merge-recursive codepath you got snippet of log message.

A bundle from my test repository is attached.  You can use it to reproduce
the repository like this:
 
    $ cd /var/tmp && mkdir test && cd test && git init
    $ git pull ../x.bndl master
    $ git fetch ../x.bndl side:side


[-- Attachment #2: a bundle of the sample history --]
[-- Type: application/octet-stream, Size: 917 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Git merge conflicts and encoding of logs
  2008-12-23  8:22 ` Junio C Hamano
@ 2008-12-23  8:41   ` Johannes Sixt
  2008-12-23 10:04     ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Johannes Sixt @ 2008-12-23  8:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Junichi Uekawa, git

Junio C Hamano schrieb:
> <<<<<<< HEAD:foo
> これはサイドブランチの変更です。
> やはり JIS コードで書いてます。
> =======
> 日本語のファイルです。
> JIS コードで書いてます。
>>>>>>>> master:foo
> 
> The above will probably come out as UTF-8 in this mail text, but the point
> is that the confict side markers do not have anything but filename and the
> branch name.  I am still scratching my head trying to see where in the
> merge-recursive codepath you got snippet of log message.

Try rebase -i instead of merge: This should put summary lines onto the
conflict markers.

-- Hannes

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Git merge conflicts and encoding of logs
  2008-12-23  8:41   ` Johannes Sixt
@ 2008-12-23 10:04     ` Junio C Hamano
  2008-12-23 23:37       ` Junichi Uekawa
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2008-12-23 10:04 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junichi Uekawa, git

Johannes Sixt <j.sixt@viscovery.net> writes:

> Junio C Hamano schrieb:
>> <<<<<<< HEAD:foo
>> これはサイドブランチの変更です。
>> やはり JIS コードで書いてます。
>> =======
>> 日本語のファイルです。
>> JIS コードで書いてます。
>>>>>>>>> master:foo
>> 
>> The above will probably come out as UTF-8 in this mail text, but the point
>> is that the confict side markers do not have anything but filename and the
>> branch name.  I am still scratching my head trying to see where in the
>> merge-recursive codepath you got snippet of log message.
>
> Try rebase -i instead of merge: This should put summary lines onto the
> conflict markers.

Ah, that's cherry-pick.

The fix should be around the area this weather-balloon patch touches.

Note that this does not correctly work yet, and it seems that somewhere the
string is truncated.

But I won't be debugging it further for now...

----
 builtin-revert.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git c/builtin-revert.c w/builtin-revert.c
index d48313c..47ff16f 100644
--- c/builtin-revert.c
+++ w/builtin-revert.c
@@ -244,6 +244,19 @@ static struct tree *empty_tree(void)
 	return tree;
 }
 
+static char *branch_label_to_output_encoding(char *oneline)
+{
+	if (git_log_output_encoding &&
+	    strcmp(git_log_output_encoding, git_commit_encoding)) {
+		char *it = reencode_string(oneline,
+					   git_log_output_encoding,
+					   git_commit_encoding);
+		if (it)
+			return it;
+	}
+	return oneline;
+}
+
 static int revert_or_cherry_pick(int argc, const char **argv)
 {
 	unsigned char head[20];
@@ -373,7 +386,7 @@ static int revert_or_cherry_pick(int argc, const char **argv)
 	read_cache();
 	init_merge_options(&o);
 	o.branch1 = "HEAD";
-	o.branch2 = oneline;
+	o.branch2 = branch_label_to_output_encoding(oneline);
 
 	head_tree = parse_tree_indirect(head);
 	next_tree = next ? next->tree : empty_tree();

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Git merge conflicts and encoding of logs
  2008-12-23 10:04     ` Junio C Hamano
@ 2008-12-23 23:37       ` Junichi Uekawa
  0 siblings, 0 replies; 5+ messages in thread
From: Junichi Uekawa @ 2008-12-23 23:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Junichi Uekawa, git

At Tue, 23 Dec 2008 02:04:57 -0800,
Junio C Hamano wrote:
> 
> Johannes Sixt <j.sixt@viscovery.net> writes:
> 
> > Junio C Hamano schrieb:
> >> <<<<<<< HEAD:foo
> >> これはサイドブランチの変更です。
> >> やはり JIS コードで書いてます。
> >> =======
> >> 日本語のファイルです。
> >> JIS コードで書いてます。
> >>>>>>>>> master:foo
> >> 
> >> The above will probably come out as UTF-8 in this mail text, but the point
> >> is that the confict side markers do not have anything but filename and the
> >> branch name.  I am still scratching my head trying to see where in the
> >> merge-recursive codepath you got snippet of log message.
> >
> > Try rebase -i instead of merge: This should put summary lines onto the
> > conflict markers.
> 
> Ah, that's cherry-pick.
> 
> The fix should be around the area this weather-balloon patch touches.
> 
> Note that this does not correctly work yet, and it seems that somewhere the
> string is truncated.


Hi, I've read patches from Alexander Gavrilov which try to support
per-file encoding in gitattributes (encoding), I assume such features
are not in yet.

My git repository has mixture of iso-2022-jp(platex source), EUC-JP
(some graph source code and other things), UTF-8, so log output format
unification to iso-2022-jp is not ideal. That way, I'd have to guess
which file is going to have a conflict through cherry-pick and define
the logoutput encoding.



regards,
	junichi
-- 
dancer@{netfort.gr.jp,debian.org}

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-12-23 23:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-23  4:48 Git merge conflicts and encoding of logs Junichi Uekawa
2008-12-23  8:22 ` Junio C Hamano
2008-12-23  8:41   ` Johannes Sixt
2008-12-23 10:04     ` Junio C Hamano
2008-12-23 23:37       ` Junichi Uekawa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).