git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lars Schneider <larsxschneider@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Torsten Bögershausen" <tboegi@web.de>,
	"Lars Schneider" <lars.schneider@autodesk.com>,
	git <git@vger.kernel.org>, "Johannes Sixt" <j6t@kdbg.org>,
	"Eric Sunshine" <sunshine@sunshineco.com>,
	peff@peff.net, ramsay@ramsayjones.plus.com,
	Johannes.Schindelin@gmx.de
Subject: Re: [PATCH v7 0/7] convert: add support for different encodings
Date: Sat, 24 Feb 2018 16:18:36 +0100	[thread overview]
Message-ID: <FDF4DEB8-E71A-4BFC-9437-678C8F65BBDC@gmail.com> (raw)
In-Reply-To: <xmqq7er3tqjq.fsf@gitster-ct.c.googlers.com>


> On 23 Feb 2018, at 21:11, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> Lars Schneider <larsxschneider@gmail.com> writes:
>> 
>>> I still think it would be nice to see diffs for arbitrary encodings.
>>> Would it be an option to read the `encoding` attribute and use it in
>>> `git diff`?
>> 
>> Reusing that gitk-only thing and suddenly start doing so would break
>> gitk users, no?  The tool expects the diff to come out encoded in
>> the encoding that is specified by that attribute (which is learned
>> from get_path_encoding helper) and does its thing.
>> 
>> I guess that gitk uses diff-tree plumbing and you won't be applying
>> this change to the plumbing, perhaps?  If so, it might not be too
>> bad, but those who decided to postprocess "git diff" output (instead
>> of "git diff-tree" output) mimicking how gitk does it by thinking
>> that is the safe and sane thing to do will be broken by such a
>> change.  You could do "use the encoding only when a command line
>> option says so", but then people will add a configuration variable
>> to turn it always on and these existing scripts will be broken.
>> 
>> I do not personally have much sympathy for the last case (i.e. those
>> who scripted around 'git diff' instead of 'git diff-tree' to get
>> broken), so making the new feature only work with the Porcelain "git
>> diff" might be an option.  I'll need a bit more time to formulate
>> the rest of my thought ;-)
> 
> So we are introducing in this series a way to say in what encoding
> the things should be placed in the working tree files (i.e. the
> w-t-e attribute attached to the paths).  Currently there is no
> mechanism to say what encoding the in-repo contents are and UTF-8 is
> assumed when conversion from/to w-t-e is required, but there is no
> fundamental reason why it shouldn't be customizable (if anything, as
> a piece of fact on the in-repo data, in-repo-encoding is *more*
> appropriate to be an attribute than w-t-e that can merely be project
> preference at best, as I mentioned earlier in this thread).

Correct.


> We always use the in-repo contents when generating 'diff'.  I think
> by "attribute to be used in diff", what you are reallying after is
> to convert the in-repo contents to that encoding _BEFORE_ running
> 'diff' on it.  E.g. in-repo UTF-16 that can have NUL bytes all over
> the place will not diff well with the xdiff machinery, but if you
> first convert it to UTF-8 and have xdiff work on it, you can get
> reasonable result out of it.  It is unclear what encoding you want
> your final diff output in (it is equally valid in such a set-up to
> desire your patch output in UTF-16 or UTF-8), but assuming that you
> want UTF-8 in your patch output, perhaps we do not have to break
> gitk users by hijacking the 'encoding' attribute.  Instead what you
> want is a single bit that says between in-repo or working tree which
> representation should be given to the xdiff machinery.

I fear that we could confuse users with an additional knob/bit that
defines what we diff against. Git always diff'ed against in-repo 
content and I feel it should stay that way.

However, I agree with your earlier emails that "working-tree-encoding"
is just one half of the feature. I also think it would be nice to be
able to define the "in-repo-encoding" as well. Then we could define 
something like that:

    *.foo 		text in-repo-encoding=UTF-16LE

This tells Git that the file is stored as UTF-16LE. This would help Git
generating a diff via UTF-8 conversion. I feel that the final patch 
should be in UTF-16LE again. Maybe over time we could then deprecate the
"encoding" attribute as the "in-repo-encoding" attribute serves a similar 
purpose (maybe gitk can switch to it).

In that case we could also do things like that:

    *.bar 		text working-tree-encoding=SHIFT-JIS in-repo-encoding=UTF-16LE

SHIFT-JIS encoded files would be reencoded to UTF-16LE on checkin.
On checkout the opposite would happen. This way we would lift the
"UTF-8 is the only in-repo encoding" limitation of the current w-t-e
implementation.

Does this sound sensible to you? That being said, I think "in-repo-encoding"
would deserve an own series.

- Lars

  reply	other threads:[~2018-02-24 15:18 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-15 15:27 [PATCH v7 0/7] convert: add support for different encodings lars.schneider
2018-02-15 15:27 ` [PATCH v7 1/7] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-02-16 12:55   ` Ævar Arnfjörð Bjarmason
2018-02-16 18:45     ` Jeff King
2018-02-16 19:30       ` Junio C Hamano
2018-02-15 15:27 ` [PATCH v7 2/7] strbuf: add xstrdup_toupper() lars.schneider
2018-02-15 15:27 ` [PATCH v7 3/7] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-02-15 15:27 ` [PATCH v7 4/7] utf8: add function to detect a missing " lars.schneider
2018-02-15 15:27 ` [PATCH v7 5/7] convert: add 'working-tree-encoding' attribute lars.schneider
2018-02-15 15:27 ` [PATCH v7 6/7] convert: add tracing for " lars.schneider
2018-02-15 15:27 ` [PATCH v7 7/7] convert: add round trip check based on 'core.checkRoundtripEncoding' lars.schneider
2018-02-15 20:03 ` [PATCH v7 0/7] convert: add support for different encodings Junio C Hamano
2018-02-15 22:09   ` Jeff King
2018-02-16 18:55     ` Junio C Hamano
2018-02-16 19:25       ` Jeff King
2018-02-16 19:27         ` Jeff King
2018-02-16 19:41           ` Junio C Hamano
2018-02-21 18:06       ` Lars Schneider
2018-02-16 14:42   ` Lars Schneider
2018-02-16 16:58     ` Torsten Bögershausen
2018-02-22 20:00       ` Lars Schneider
2018-02-22 20:12         ` Jeff King
2018-02-23 16:35         ` Junio C Hamano
2018-02-23 20:11           ` Junio C Hamano
2018-02-24 15:18             ` Lars Schneider [this message]
2018-02-26  1:44               ` Jeff King
2018-02-26 17:35                 ` Torsten Bögershausen
2018-02-26 20:46                   ` Jeff King
2018-02-27 21:05                     ` Torsten Bögershausen
2018-02-27 21:25                       ` Jeff King
2018-02-27 21:55                         ` Junio C Hamano
2018-02-27 21:58                           ` Jeff King
2018-02-27 22:10                             ` Junio C Hamano
2018-02-27 22:20                               ` Jeff King
2018-02-28  8:20                         ` Torsten Bögershausen
2018-02-28 13:21                           ` Jeff King
2018-02-28 17:42                             ` Junio C Hamano
2018-03-01  7:49                               ` Jeff King
2018-03-04 10:16                             ` Torsten Bögershausen
2018-02-28 20:46                         ` Lars Schneider
2018-02-16 19:04     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FDF4DEB8-E71A-4BFC-9437-678C8F65BBDC@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=lars.schneider@autodesk.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sunshine@sunshineco.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).