All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: "Adrián Gimeno Balaguer" <adrigibal@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: git-rebase is ignoring working-tree-encoding
Date: Thu, 8 Nov 2018 18:02:30 +0100	[thread overview]
Message-ID: <20181108170230.GA6652@tor.lan> (raw)
In-Reply-To: <CADN+U_N345aMaiN4CT-_qsecw2gv=8-r+Hqq+CNz-xOx2KGYzg@mail.gmail.com>

On Wed, Nov 07, 2018 at 05:38:18AM +0100, Adrián Gimeno Balaguer wrote:
> Hello Torsten,
> 
> Thanks for answering.
> 
> Answering to your question, I removed the comments with "rebase" since
> my reported encoding issue happens on more simpler operations
> (described in the PR), and the problem is not directly related to
> rebasing, so I considered it better in order to avoid unrelated
> confusions.
> 
> Let's get back to the problem. Each system has a default endianness.
> Also, in .gitattributes's working-tree-encoding, Git behaves
> differently depending on the attribute's value and the contents of the
> referenced entry file. When I put the value "UTF-16", then the file
> must have a BOM, or Git complains. Otherwise, if I put the value
> "UTF-16BE" or "UTF-16LE", then Git prohibites operations if file has a
> BOM for that main encoding (UTF-16 here), which can be relate to any
> endianness.
> 
> My very initial goal was, given a UTF-16LE file, to be able to view
> human-readable diffs whenever I make a change on it (and yes, it must
> be Little Endian). Plus, this file had a BOM. Now, what are the
> options with Git currently (consider only working-tree-encoding)? If I
> put working-tree-encoding=UTF-16, then I could view readable diffs and
> commit the file, but here is the main problem: Git looses information
> about what initial endianness the file had, therefore, after
> staging/committing it re-encodes the file from UTF-8 (as stored
> internally) to UTF-16 and the default system endianness. In my case it
> did to Big Endian, thus affecting the project's requirement. That is
> why I ended up writing a fixup script to change the encoding back to
> UTF-16LE.

OK, I think I understand your problem now.
The file format which you ask for could be named "UTF-16-BOM-LE",
but that does not exist in reality.
If you use UTF-16, then there must be a BOM, and if there is a BOM,
then a Unicode-aware application -should- be able to handle it.

Why does your project require such a format ?

> 
> On the other hand, once I set working-tree-encoding=UTF-16LE, then Git
> prohibited me from committing the file and even viewing human-readable
> diffs (the output simply tells it's a binary file). In this sense, the
> internal location of these  errors is within the function of utf8.c I
> made changes to in the PR. I hope I was clearer!
> 
> Finally, Git behaviour around this is based on Unicode standards,
> which is why I acknowledged that my changes violated them after
> refering to a link which is present in the ut8.h file.

[]

  reply	other threads:[~2018-11-08 17:07 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-02  2:30 git-rebase is ignoring working-tree-encoding Adrián Gimeno Balaguer
2018-11-04 15:47 ` brian m. carlson
2018-11-04 16:37   ` Adrián Gimeno Balaguer
2018-11-04 18:38     ` brian m. carlson
2018-11-04 17:07 ` Torsten Bögershausen
2018-11-05  4:24   ` Adrián Gimeno Balaguer
2018-11-05 18:10     ` Torsten Bögershausen
2018-11-06 20:16       ` Torsten Bögershausen
2018-11-07  4:38         ` Adrián Gimeno Balaguer
2018-11-08 17:02           ` Torsten Bögershausen [this message]
2018-12-26  0:56             ` Alexandre Grigoriev
2018-12-26 19:25               ` brian m. carlson
2018-12-27  2:52                 ` Alexandre Grigoriev
2018-12-27 14:45                   ` Torsten Bögershausen
2018-12-23 14:46   ` Alexandre Grigoriev
2018-12-29 11:09 ` [PATCH/RFC v1 1/1] Support working-tree-encoding "UTF-16LE-BOM" tboegi
     [not found]   ` <CADN+U_OccLuLN7_0rjikDgLT+Zvt8hka-=xsnVVLJORjYzP78Q@mail.gmail.com>
2018-12-29 15:48     ` Adrián Gimeno Balaguer
2018-12-29 17:54       ` Philip Oakley
2019-01-20 16:43 ` [PATCH v2 " tboegi
2019-01-22 20:13   ` Junio C Hamano
2019-01-30 15:01 ` [PATCH v3 " tboegi
2019-01-30 15:24   ` Jason Pyeron
2019-01-30 17:49     ` Torsten Bögershausen
2019-03-06  5:23 ` [PATCH v1 1/1] gitattributes.txt: fix typo tboegi
2019-03-07  0:24   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181108170230.GA6652@tor.lan \
    --to=tboegi@web.de \
    --cc=adrigibal@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.