* Git merge conflicts and encoding of logs
@ 2008-12-23 4:48 Junichi Uekawa
2008-12-23 8:22 ` Junio C Hamano
0 siblings, 1 reply; 5+ messages in thread
From: Junichi Uekawa @ 2008-12-23 4:48 UTC (permalink / raw)
To: git
Hi,
Git merge conflict will insert '<<< first line of commit log message'
'===' '>>>' markers to the text file that is causing a conflict.
Unfortunately, the encoding of the text file may be different from the
log message encoding, and that results in a file which has a mixed
encoding (which is pretty hard to edit from any editor BTW).
My use case is editing platex files (iso-2022-jp encoded) with log
messages of utf-8.
http://git.debian.org/?p=tokyodebian/monthly-report.git;a=summary
... Thinking about it, it's probably the same encoding problem as git
blame. Is there already a good way to fix this or is this something
that needs fixing?
regards,
junichi
--
dancer@{netfort.gr.jp,debian.org}
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Git merge conflicts and encoding of logs
2008-12-23 4:48 Git merge conflicts and encoding of logs Junichi Uekawa
@ 2008-12-23 8:22 ` Junio C Hamano
2008-12-23 8:41 ` Johannes Sixt
0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2008-12-23 8:22 UTC (permalink / raw)
To: Junichi Uekawa; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 2267 bytes --]
Junichi Uekawa <dancer@netfort.gr.jp> writes:
> Git merge conflict will insert '<<< first line of commit log message'
> '===' '>>>' markers to the text file that is causing a conflict.
>
> Unfortunately, the encoding of the text file may be different from the
> log message encoding, and that results in a file which has a mixed
> encoding (which is pretty hard to edit from any editor BTW).
>
> My use case is editing platex files (iso-2022-jp encoded) with log
> messages of utf-8.
>
> ... Thinking about it, it's probably the same encoding problem as git
> blame.
What 69cd8f6 (builtin-blame: Reencode commit messages according to git-log
rules., 2008-10-22) does to git-blame is to re-encode the data taken from
the commit log to i18n.logoutputencoding, and put that in the datastream.
If your commit object have names and messages in utf-8, and if you set
i18n.logoutputencoding to iso-2022-jp, that would reencode data taken from
the commit object in iso-2022-jp and sprinkle them in the blame
datastream.
The issue would be certainly similar, *if* anything on your <<</===/>>>
lines came from commit log message, but I couldn't trigger what you
describe. I prepared a history of this shape:
B
/
o---A
with ISO-2022-JP payload and UTF-8 commit log message. Then, I added:
[i18n]
logoutputencoding = iso-2022-jp
which lets me read "git log -p --all" quite comfortably. Everything comes
out as good old JISX0208. So far, so good.
Then while on branch B, I tried to merge A, which resulted in conflicts
that looked like this:
<<<<<<< HEAD:foo
これはサイドブランチの変更です。
やはり JIS コードで書いてます。
=======
日本語のファイルです。
JIS コードで書いてます。
>>>>>>> master:foo
The above will probably come out as UTF-8 in this mail text, but the point
is that the confict side markers do not have anything but filename and the
branch name. I am still scratching my head trying to see where in the
merge-recursive codepath you got snippet of log message.
A bundle from my test repository is attached. You can use it to reproduce
the repository like this:
$ cd /var/tmp && mkdir test && cd test && git init
$ git pull ../x.bndl master
$ git fetch ../x.bndl side:side
[-- Attachment #2: a bundle of the sample history --]
[-- Type: application/octet-stream, Size: 917 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Git merge conflicts and encoding of logs
2008-12-23 8:22 ` Junio C Hamano
@ 2008-12-23 8:41 ` Johannes Sixt
2008-12-23 10:04 ` Junio C Hamano
0 siblings, 1 reply; 5+ messages in thread
From: Johannes Sixt @ 2008-12-23 8:41 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Junichi Uekawa, git
Junio C Hamano schrieb:
> <<<<<<< HEAD:foo
> これはサイドブランチの変更です。
> やはり JIS コードで書いてます。
> =======
> 日本語のファイルです。
> JIS コードで書いてます。
>>>>>>>> master:foo
>
> The above will probably come out as UTF-8 in this mail text, but the point
> is that the confict side markers do not have anything but filename and the
> branch name. I am still scratching my head trying to see where in the
> merge-recursive codepath you got snippet of log message.
Try rebase -i instead of merge: This should put summary lines onto the
conflict markers.
-- Hannes
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Git merge conflicts and encoding of logs
2008-12-23 8:41 ` Johannes Sixt
@ 2008-12-23 10:04 ` Junio C Hamano
2008-12-23 23:37 ` Junichi Uekawa
0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2008-12-23 10:04 UTC (permalink / raw)
To: Johannes Sixt; +Cc: Junichi Uekawa, git
Johannes Sixt <j.sixt@viscovery.net> writes:
> Junio C Hamano schrieb:
>> <<<<<<< HEAD:foo
>> これはサイドブランチの変更です。
>> やはり JIS コードで書いてます。
>> =======
>> 日本語のファイルです。
>> JIS コードで書いてます。
>>>>>>>>> master:foo
>>
>> The above will probably come out as UTF-8 in this mail text, but the point
>> is that the confict side markers do not have anything but filename and the
>> branch name. I am still scratching my head trying to see where in the
>> merge-recursive codepath you got snippet of log message.
>
> Try rebase -i instead of merge: This should put summary lines onto the
> conflict markers.
Ah, that's cherry-pick.
The fix should be around the area this weather-balloon patch touches.
Note that this does not correctly work yet, and it seems that somewhere the
string is truncated.
But I won't be debugging it further for now...
----
builtin-revert.c | 15 ++++++++++++++-
1 files changed, 14 insertions(+), 1 deletions(-)
diff --git c/builtin-revert.c w/builtin-revert.c
index d48313c..47ff16f 100644
--- c/builtin-revert.c
+++ w/builtin-revert.c
@@ -244,6 +244,19 @@ static struct tree *empty_tree(void)
return tree;
}
+static char *branch_label_to_output_encoding(char *oneline)
+{
+ if (git_log_output_encoding &&
+ strcmp(git_log_output_encoding, git_commit_encoding)) {
+ char *it = reencode_string(oneline,
+ git_log_output_encoding,
+ git_commit_encoding);
+ if (it)
+ return it;
+ }
+ return oneline;
+}
+
static int revert_or_cherry_pick(int argc, const char **argv)
{
unsigned char head[20];
@@ -373,7 +386,7 @@ static int revert_or_cherry_pick(int argc, const char **argv)
read_cache();
init_merge_options(&o);
o.branch1 = "HEAD";
- o.branch2 = oneline;
+ o.branch2 = branch_label_to_output_encoding(oneline);
head_tree = parse_tree_indirect(head);
next_tree = next ? next->tree : empty_tree();
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Git merge conflicts and encoding of logs
2008-12-23 10:04 ` Junio C Hamano
@ 2008-12-23 23:37 ` Junichi Uekawa
0 siblings, 0 replies; 5+ messages in thread
From: Junichi Uekawa @ 2008-12-23 23:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Sixt, Junichi Uekawa, git
At Tue, 23 Dec 2008 02:04:57 -0800,
Junio C Hamano wrote:
>
> Johannes Sixt <j.sixt@viscovery.net> writes:
>
> > Junio C Hamano schrieb:
> >> <<<<<<< HEAD:foo
> >> これはサイドブランチの変更です。
> >> やはり JIS コードで書いてます。
> >> =======
> >> 日本語のファイルです。
> >> JIS コードで書いてます。
> >>>>>>>>> master:foo
> >>
> >> The above will probably come out as UTF-8 in this mail text, but the point
> >> is that the confict side markers do not have anything but filename and the
> >> branch name. I am still scratching my head trying to see where in the
> >> merge-recursive codepath you got snippet of log message.
> >
> > Try rebase -i instead of merge: This should put summary lines onto the
> > conflict markers.
>
> Ah, that's cherry-pick.
>
> The fix should be around the area this weather-balloon patch touches.
>
> Note that this does not correctly work yet, and it seems that somewhere the
> string is truncated.
Hi, I've read patches from Alexander Gavrilov which try to support
per-file encoding in gitattributes (encoding), I assume such features
are not in yet.
My git repository has mixture of iso-2022-jp(platex source), EUC-JP
(some graph source code and other things), UTF-8, so log output format
unification to iso-2022-jp is not ideal. That way, I'd have to guess
which file is going to have a conflict through cherry-pick and define
the logoutput encoding.
regards,
junichi
--
dancer@{netfort.gr.jp,debian.org}
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-12-23 23:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-23 4:48 Git merge conflicts and encoding of logs Junichi Uekawa
2008-12-23 8:22 ` Junio C Hamano
2008-12-23 8:41 ` Johannes Sixt
2008-12-23 10:04 ` Junio C Hamano
2008-12-23 23:37 ` Junichi Uekawa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).