From: Junio C Hamano <gitster@pobox.com>
To: Tsugikazu Shibata <tshibata@ab.jp.nec.com>
Cc: git@vger.kernel.org
Subject: Re: [RFC/PATCH] fix "git diff" to create wrong UTF-8 text
Date: Tue, 01 Jan 2008 21:26:54 -0800 [thread overview]
Message-ID: <7v1w904x29.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: 20080102.082014.02281301.tshibata@ab.jp.nec.com
Tsugikazu Shibata <tshibata@ab.jp.nec.com> writes:
> I believe this should be work for another language using UTF-8 and
> solve this issue.
> ...
> @@ -368,6 +394,7 @@ int xdl_emit_hunk_hdr(long s1, long c1,
> buf[nb++] = ' ';
> if (funclen > sizeof(buf) - nb - 1)
> funclen = sizeof(buf) - nb - 1;
> + funclen = utf8width(func, funclen);
> memcpy(buf + nb, func, funclen);
> nb += funclen;
> }
I'd rather not do this in xdiff/ level for two reasons.
We consider the functions there strictly "borrowed code", and
ideally I'd rather even not to have that "chop down to funclen"
logic at that level.
The code at that level does not know what paths it is dealing
with and cannot consult git specific data (i.e. attributes); it
would make it harder to enhance it later by introducing per-path
encoding information.
How about...
* (optional) lift the funclen limit from xdl_emit_hunk_hdr()
and xdl_emit_diff() and have them preserve the full line;
* Around ll.554 in diff.c::fn_out_consume(), look at
line[i..eol] and apply the "chomp between character" logic
there. I think it is very sensible to use the UTF-8 logic by
default as you did above (but I suspect you may be able to
reuse helper functions in utf8.c such as git_wcwidth(),
instead of rolling your own). Chomp the funcname line given
from xdiff layer in this function.
* (future) make the length of the funcname line configurable
either from the command line or configuration.
* (future) add per-path blob encoding information (default to
UTF-8) to struct emit_callback, and initialize it from the
gitattributes(5) mechanism, just like we have added ws_rule
recently there. Use that to decide how inter-character chomp
works to customize the logic in diff.c::fn_out_consume() you
would introduce above.
I think the two future enhancements I listed above as examples
would be cleaner to implement if the line chomping is done in
fn_out_consume().
next prev parent reply other threads:[~2008-01-02 5:27 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-01 23:20 [RFC/PATCH] fix "git diff" to create wrong UTF-8 text Tsugikazu Shibata
2008-01-02 5:26 ` Junio C Hamano [this message]
2008-01-02 9:49 ` [PATCH 1/2] utf8_width(): allow non NUL-terminated input Junio C Hamano
2008-01-02 9:50 ` [PATCH 2/2] diff: do not chomp hunk-header in the middle of a character Junio C Hamano
2008-01-02 9:50 ` [PATCH 3/2] attribute "coding": specify blob encoding Junio C Hamano
2008-01-03 21:23 ` しらいしななこ
2008-01-03 21:54 ` Junio C Hamano
2008-01-04 16:16 ` Tsugikazu Shibata
2008-01-04 20:53 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7v1w904x29.fsf@gitster.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=tshibata@ab.jp.nec.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).