All of lore.kernel.org
 help / color / mirror / Atom feed
From: Karsten Blees <karsten.blees@gmail.com>
To: Ken Ismert <kismert@gmail.com>
Cc: msysgit@googlegroups.com, git@vger.kernel.org
Subject: Re: Script for handling UTF-16 files
Date: Wed, 10 Apr 2013 20:59:56 +0200	[thread overview]
Message-ID: <5165B6AC.5090403@gmail.com> (raw)
In-Reply-To: <608a349b-cc71-4cba-9197-3783049e9f47@googlegroups.com>

Am 10.04.2013 01:47, schrieb Ken Ismert:
> 
> I bumped into the UTF-16 display problem with Git Extensions running on top of msysGit. After lots of searching and experimenting, I came up with a solution that works for me.
> 
> Note: Please see questions below.
> 
> This method is for MSysGit 1.8.1, and is tested on Windows XP. I use Git Extensions 2.44, but since the changes are at the Git level, they should work for Git Gui as well. Steps:

There has been a discussion about handling UTF-16 on the git ML a while back, see http://thread.gmane.org/gmane.comp.version-control.git/159708

As suggested there, I would try to use a clean/smudge filter (i.e. store UTF-16 files as UTF-8 in the repository and convert back to UTF-16 on checkout). That way git can treat your UTF-16 files as text in most cases (i.e. you can merge them, git-grep works, gitattributes work (eol-conversion, ident-replacement, built-in diff patterns...)).

If you use a textconv filter, UTF-16 content will be treated as binary by most git operations.

There's also an 'encoding' attribute and a 'gui.encoding' setting which in theory should solve your issue (i.e. specify encoding of files for display by GUI tools). I don't know if Git Extensions supports that, or whether its supposed to work for binary files at all.

> 3) Modify the global ~/Git/etc/gitconfig or your local ~/.git/config file, and add these lines:
> 
>     [diff "astextutf16"]
>         textconv = astextutf16

Why not simply "textconv = iconv -f utf-16 -t utf-8", without the extra script?

> c) I had success with iconv, but is there any built-in UTF-16 to UTF-8 converter that ships with msysGit?

There are ready-to-use UTF-conversion functions in the codebase, but these are not accessible as a git command or built-in filter.

> As a quick fix, how hard would it be to add a 'utf16' diff filter, similar to cpp or |csharp? Or is this simply the wrong place to put in a work-around?

As described above, I think a diff filter is not the right tool for the job. The only universal format for text content that works reasonably well with established text-based technologies (merge algorithms, regex etc.) is UTF-8. If we want to benefit from these technologies, git should store text files as UTF-8 and convert from / to platform-specific formats on checkin / checkout or for display.

Bye,
Karsten

-- 
-- 
*** Please reply-to-all at all times ***
*** (do not pretend to know who is subscribed and who is not) ***
*** Please avoid top-posting. ***
The msysGit Wiki is here: https://github.com/msysgit/msysgit/wiki - Github accounts are free.

You received this message because you are subscribed to the Google
Groups "msysGit" group.
To post to this group, send email to msysgit@googlegroups.com
To unsubscribe from this group, send email to
msysgit+unsubscribe@googlegroups.com
For more options, and view previous threads, visit this group at
http://groups.google.com/group/msysgit?hl=en_US?hl=en

--- 
You received this message because you are subscribed to the Google Groups "msysGit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to msysgit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

       reply	other threads:[~2013-04-10 18:59 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <608a349b-cc71-4cba-9197-3783049e9f47@googlegroups.com>
2013-04-10 18:59 ` Karsten Blees [this message]
2013-04-11 19:11   ` Script for handling UTF-16 files Ken Ismert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5165B6AC.5090403@gmail.com \
    --to=karsten.blees@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=kismert@gmail.com \
    --cc=msysgit@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.