From: "Torsten Bögershausen" <tboegi@web.de>
To: Lars Schneider <larsxschneider@gmail.com>
Cc: Git List <git@vger.kernel.org>, Johannes Sixt <j6t@kdbg.org>,
Eric Sunshine <sunshine@sunshineco.com>,
Jeff King <peff@peff.net>,
Ramsay Jones <ramsay@ramsayjones.plus.com>,
Johannes.Schindelin@gmx.de
Subject: Re: [PATCH/RFC v5 7/7] Careful with CRLF when using e.g. UTF-16 for working-tree-encoding
Date: Tue, 30 Jan 2018 15:40:02 +0100 [thread overview]
Message-ID: <20180130144002.GA30211@tor.lan> (raw)
In-Reply-To: <55B6C3D5-4131-4636-AD0E-20759EDBE8CD@gmail.com>
On Tue, Jan 30, 2018 at 12:23:47PM +0100, Lars Schneider wrote:
>
> > On 29 Jan 2018, at 21:19, tboegi@web.de wrote:
> >
> > From: Torsten Bögershausen <tboegi@web.de>
> >
> > UTF-16 encoded files are treated as "binary" by Git, and no CRLF
> > conversion is done.
> > When the UTF-16 encoded files are converted into UF-8 using the new
> s/UF-8/UTF-8/
>
>
> > "working-tree-encoding", the CRLF are converted if core.autocrlf is true.
> >
> > This may lead to confusion:
> > A tool writes an UTF-16 encoded file with CRLF.
> > The file is commited with core.autocrlf=true, the CLRF are converted into LF.
> > The repo is pushed somewhere and cloned by a different user, who has
> > decided to use core.autocrlf=false.
> > He uses the same tool, and now the CRLF are not there as expected, but LF,
> > make the file useless for the tool.
> >
> > Avoid this (possible) confusion by ignoring core.autocrlf for all files
> > which have "working-tree-encoding" defined.
>
> Maybe I don't understand your use case but I think this will generate even
> more confusion because that's not what I would expect as a user. I think Git
> should behave consistently independent of the used encoding. Here are my arguments:
To start with: I have probably seen too many repos with CRLF messed up.
>
> (1) Legacy users are *not* affected. If you don't use the "working-tree-encoding"
> attribute then nothing changes for you.
People who don't use "working-tree-encoding" are not affected,
I never ment to state that.
I am thinking about people who use "working-tree-encoding" without thinking
about line endings.
Or the ones that have in mind that core.autocrlf=true will leave the
line endings for UTF-16 encoded files as is, but that changes as soon as they
are converted into UTF-8 and the "auto" check is now done
-after- the conversion. I would find that confusing.
>
> (2) If you use the "working-tree-encoding" attribute *and* you want to ensure
> your file keeps CRLF then you can define that in the attributes too. E.g.:
>
> *.proj textworking-tree-encoding=UTF-16 eol=crlf
That is a good one.
If you ever plan a re-roll (I don't at the moment) the *.proj extemsion
make much more sense in Documentation/gitattributes that *.tx
There no text files encoded in UTF-16 wich are called xxx.txt, but those
are non-ideal examples. *.proj makes good sense as an example.
>
> - Lars
>
>
>
> > The user can still use a .gitattributes file and specify the line endings
> > like "text=auto", "text", or "text eol=crlf" and let that .gitattribute
> > file travel together with push and clone.
> >
> > Change convert.c to e more careful, simplify the initialization when
> > attributes are retrived (and none are specified) and update the documentation.
> >
> > Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> > ---
> > Documentation/gitattributes.txt | 9 ++++++---
> > convert.c | 15 ++++++++++++---
> > 2 files changed, 18 insertions(+), 6 deletions(-)
> >
> > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> > index a8dbf4be3..3665c4677 100644
> > --- a/Documentation/gitattributes.txt
> > +++ b/Documentation/gitattributes.txt
> > @@ -308,12 +308,15 @@ Use the `working-tree-encoding` attribute only if you cannot store a file in
> > UTF-8 encoding and if you want Git to be able to process the content as
> > text.
> >
> > +Note that when `working-tree-encoding` is defined, core.autocrlf is ignored.
> > +Set the `text` attribute (or `text=auto`) to enable CRLF conversions.
> > +
> > Use the following attributes if your '*.txt' files are UTF-16 encoded
> > -with byte order mark (BOM) and you want Git to perform automatic line
> > -ending conversion based on your platform.
> > +with byte order mark (BOM) and you want Git to perform line
> > +ending conversion based on core.eol.
> >
> > ------------------------
> > -*.txt text working-tree-encoding=UTF-16
> > +*.txt working-tree-encoding=UTF-16 text
> > ------------------------
> >
> > Use the following attributes if your '*.txt' files are UTF-16 little
> > diff --git a/convert.c b/convert.c
> > index 13fad490c..e7f11d1db 100644
> > --- a/convert.c
> > +++ b/convert.c
> > @@ -1264,15 +1264,24 @@ static void convert_attrs(struct conv_attrs *ca, const char *path)
> > }
> > ca->checkout_encoding = git_path_check_encoding(ccheck + 5);
> > } else {
> > - ca->drv = NULL;
> > - ca->crlf_action = CRLF_UNDEFINED;
> > - ca->ident = 0;
> > + memset(ca, 0, sizeof(*ca));
> > }
> >
> > /* Save attr and make a decision for action */
> > ca->attr_action = ca->crlf_action;
> > if (ca->crlf_action == CRLF_TEXT)
> > ca->crlf_action = text_eol_is_crlf() ? CRLF_TEXT_CRLF : CRLF_TEXT_INPUT;
> > + /*
> > + * Often UTF-16 encoded files are read and written by programs which
> > + * really need CRLF, and it is important to keep the CRLF "as is" when
> > + * files are committed with core.autocrlf=true and the repo is pushed.
> > + * The CRLF would be converted into LF when the repo is cloned to
> > + * a machine with core.autocrlf=false.
> > + * Obey the "text" and "eol" attributes and be independent on the
> > + * local core.autocrlf for all "encoded" files.
> > + */
> > + if ((ca->crlf_action == CRLF_UNDEFINED) && ca->checkout_encoding)
> > + ca->crlf_action = CRLF_BINARY;
> > if (ca->crlf_action == CRLF_UNDEFINED && auto_crlf == AUTO_CRLF_FALSE)
> > ca->crlf_action = CRLF_BINARY;
> > if (ca->crlf_action == CRLF_UNDEFINED && auto_crlf == AUTO_CRLF_TRUE)
> > --
> > 2.16.0.rc0.2.g64d3e4d0cc.dirty
> >
>
next prev parent reply other threads:[~2018-01-30 14:40 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-20 15:24 [PATCH v4 0/6] convert: add support for different encodings lars.schneider
2018-01-20 15:24 ` [PATCH v4 1/6] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-01-20 15:24 ` [PATCH v4 2/6] strbuf: add xstrdup_toupper() lars.schneider
2018-01-20 15:24 ` [PATCH v4 3/6] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-01-20 15:24 ` [PATCH v4 4/6] utf8: add function to detect a missing " lars.schneider
2018-01-20 15:24 ` [PATCH v4 5/6] convert: add 'working-tree-encoding' attribute lars.schneider
2018-01-21 14:22 ` Simon Ruderich
2018-01-22 12:35 ` Lars Schneider
2018-01-23 0:54 ` Jeff King
2018-01-23 10:25 ` Simon Ruderich
2018-01-23 16:20 ` Jeff King
2018-01-23 21:12 ` Junio C Hamano
2018-01-23 9:27 ` Simon Ruderich
2018-01-22 18:00 ` SQUASH convert: add tracing for " lars.schneider
2018-01-22 19:37 ` Eric Sunshine
2018-01-23 10:17 ` [PATCH v2] " lars.schneider
2018-01-20 15:24 ` [PATCH v4 6/6] " lars.schneider
2018-01-23 20:19 ` [PATCH v4 0/6] convert: add support for different encodings Torsten Bögershausen
2018-01-23 20:53 ` Junio C Hamano
2018-01-29 20:18 ` [PATCH v5 0/7] " tboegi
2018-01-29 20:18 ` [PATCH v5 1/7] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() tboegi
2018-01-29 20:19 ` [PATCH v5 2/7] strbuf: add xstrdup_toupper() tboegi
2018-01-29 20:19 ` [PATCH v5 3/7] utf8: add function to detect prohibited UTF-16/32 BOM tboegi
2018-01-29 20:19 ` [PATCH v5 4/7] utf8: add function to detect a missing " tboegi
2018-01-30 19:15 ` Junio C Hamano
2018-01-30 20:58 ` Lars Schneider
2018-01-30 21:54 ` Junio C Hamano
2018-01-29 20:19 ` [PATCH v5 5/7] convert: add 'working-tree-encoding' attribute tboegi
2018-01-30 20:05 ` Junio C Hamano
2018-01-30 20:31 ` Lars Schneider
2018-01-30 21:56 ` Junio C Hamano
2018-01-31 19:12 ` Lars Schneider
2018-01-31 22:45 ` Junio C Hamano
2018-01-29 20:19 ` [PATCH v5 6/7] convert: add tracing for " tboegi
2018-01-29 20:19 ` [PATCH/RFC v5 7/7] Careful with CRLF when using e.g. UTF-16 for working-tree-encoding tboegi
2018-01-30 11:23 ` Lars Schneider
2018-01-30 14:40 ` Torsten Bögershausen [this message]
2018-01-30 15:14 ` Lars Schneider
2018-01-31 17:28 ` Torsten Bögershausen
2018-01-31 19:37 ` Lars Schneider
2018-02-02 19:17 ` Junio C Hamano
2018-02-07 6:31 ` Torsten Bögershausen
2018-02-07 18:12 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180130144002.GA30211@tor.lan \
--to=tboegi@web.de \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=j6t@kdbg.org \
--cc=larsxschneider@gmail.com \
--cc=peff@peff.net \
--cc=ramsay@ramsayjones.plus.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.