All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: lars.schneider@autodesk.com
Cc: git@vger.kernel.org, gitster@pobox.com, j6t@kdbg.org,
	sunshine@sunshineco.com, peff@peff.net,
	ramsay@ramsayjones.plus.com, Johannes.Schindelin@gmx.de,
	Lars Schneider <larsxschneider@gmail.com>
Subject: Re: [PATCH v3 0/7] convert: add support for different encodings
Date: Sun, 7 Jan 2018 10:38:15 +0100	[thread overview]
Message-ID: <20180107093815.GA7442@tor.lan> (raw)
In-Reply-To: <20180106004808.77513-1-lars.schneider@autodesk.com>

On Sat, Jan 06, 2018 at 01:48:01AM +0100, lars.schneider@autodesk.com wrote:
> From: Lars Schneider <larsxschneider@gmail.com>
> 
> Hi,
> 
> Patches 1-5 and 6 are helper functions and preparation.
> Patch 6 is the actual change.
> 
> I am still torn between "checkout-encoding" and "working-tree-encoding"
> as attribute name. I am happy to hear arguments for/against one or the
> other.

checkout-encoding is probably misleading, as it is even the checkin-encoding.

What is wrong with working-tree-encoding ?
I think the 2 "-".

What was wrong with workingtree-encoding ?
Or
workdir-encoding ?



> 
> Changes since v2:
> 
> * Added Torsten's crlfsave refactoring patch (patch 5)
>   @Torsten: I tried to make the commit message more clean, added
>             some comments to and renamed conv_flags_eol to
>             global_conv_flags_eol.
> 
> * Improved documentation and commit message (Torsten)

Good, thanks.
> 
> * Removed unnecessary NUL assignment in xstrdup_tolower() (Torsten)
> 
> * Set "git config core.eol lf" to made the test run on Windows (Dscho)
> 
> * Made BOM arrays static (Ramsay)


Some comments:

I would like to have the CRLF conversion a little bit more strict -
many users tend to set core.autocrlf=true or write "* text=auto"
in the .gitattributes.
Reading all the effort about BOM markers and UTF-16LE, I think there
should ne some effort to make the line endings round trip.
Therefore I changed convert.c to demand that the "text" attribute
is set to enable CRLF conversions.
(If I had submitted the patch, I would have demanded
"text eol=lf" or "text eol=crlf", but the test case t0028 indicates
that there is a demand to produce line endings as configured in core.eol)

Anyway, I rebased it onto git.git/master, changed the docu, and pushed it to
https://github.com/tboegi/git/tree/180107-0935-For-lars-schneider-encode-V3B

Here is a inter-diff against your version:

 diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
 index 1bc03e69c..b8d9f91c8 100644
 --- a/Documentation/gitattributes.txt
 +++ b/Documentation/gitattributes.txt
 @@ -281,7 +281,7 @@ interpreted as binary and consequently built-in Git text processing
  tools (e.g. 'git diff') as well as most Git web front ends do not
  visualize the content.
  
 -In these cases you can teach Git the encoding of a file in the working
 +In these cases you can tell Git the encoding of a file in the working
  directory with the `checkout-encoding` attribute. If a file with this
  attributes is added to Git, then Git reencodes the content from the
  specified encoding to UTF-8 and stores the result in its internal data
 @@ -308,17 +308,20 @@ Use the `checkout-encoding` attribute only if you cannot store a file in
  UTF-8 encoding and if you want Git to be able to process the content as
  text.
  
 +Note that when `checkout-encoding` is defined, by default the line
 +endings are not converted. `text=auto` and core.autocrlf are ignored.
 +Set the `text` attribute to enable CRLF conversions.
 +
  Use the following attributes if your '*.txt' files are UTF-16 encoded
 -with byte order mark (BOM) and you want Git to perform automatic line
 -ending conversion based on your platform.
 +with byte order mark (BOM).
  
  ------------------------
 -*.txt		text checkout-encoding=UTF-16
 +*.txt		checkout-encoding=UTF-16
  ------------------------
  
  Use the following attributes if your '*.txt' files are UTF-16 little
 -endian encoded without BOM and you want Git to use Windows line endings
 -in the working directory.
 +endian encoded without BOM and you want Git to use LF in the repo and
 +CRLF in the working directory.
  
  ------------------------
  *.txt 		checkout-encoding=UTF-16LE text eol=CRLF
 diff --git a/convert.c b/convert.c
 index 13f766d2a..1e29f515e 100644
 --- a/convert.c
 +++ b/convert.c
 @@ -221,18 +221,27 @@ static void check_global_conv_flags_eol(const char *path, enum crlf_action crlf_
  	}
  }
  
  
  static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats,
 @@ -432,7 +441,7 @@ static int crlf_to_git(const struct index_state *istate,
  		 * cherry-pick.
  		 */
  		if ((!(conv_flags & CONV_EOL_RENORMALIZE)) &&
 -		    has_cr_in_index(istate, path))
 +		    has_crlf_in_index(istate, path))
  			convert_crlf_into_lf = 0;
  	}
  	if (((conv_flags & CONV_EOL_RNDTRP_WARN) ||
 @@ -1214,9 +1223,28 @@ static void convert_attrs(struct conv_attrs *ca, const char *path)
  			ca->crlf_action = git_path_check_crlf(ccheck + 0);
  		ca->ident = git_path_check_ident(ccheck + 1);
  		ca->drv = git_path_check_convert(ccheck + 2);
 +		ca->checkout_encoding = git_path_check_encoding(ccheck + 5);
  		if (ca->crlf_action != CRLF_BINARY) {
  			enum eol eol_attr = git_path_check_eol(ccheck + 3);
 -			if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF)
 +			if (ca->checkout_encoding) {
 +				enum crlf_action crlf_action = CRLF_BINARY;
 +				/*
 +				 * encoded files don't use auto.
 +				 * 'text' must be specified to
 +				 * do crlf conversions
 +				 */
 +				if (ca->crlf_action == CRLF_TEXT) {
 +					if (eol_attr == EOL_LF)
 +						crlf_action = CRLF_TEXT_INPUT;
 +					else if (eol_attr == EOL_CRLF)
 +						crlf_action = CRLF_TEXT_CRLF;
 +					else if (text_eol_is_crlf())
 +						crlf_action = CRLF_TEXT_CRLF;
 +					else
 +						crlf_action = CRLF_TEXT_INPUT;
 +				}
 +				ca->crlf_action = crlf_action;
 +			} else if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF)
  				ca->crlf_action = CRLF_AUTO_INPUT;
  			else if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_CRLF)
  				ca->crlf_action = CRLF_AUTO_CRLF;
 @@ -1225,11 +1253,11 @@ static void convert_attrs(struct conv_attrs *ca, const char *path)
  			else if (eol_attr == EOL_CRLF)
  				ca->crlf_action = CRLF_TEXT_CRLF;
  		}
 -		ca->checkout_encoding = git_path_check_encoding(ccheck + 5);
  	} else {
  		ca->drv = NULL;
  		ca->crlf_action = CRLF_UNDEFINED;
  		ca->ident = 0;
 +		ca->checkout_encoding = NULL;
  	}
  
  	/* Save attr and make a decision for action */



  parent reply	other threads:[~2018-01-07  9:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-06  0:48 [PATCH v3 0/7] convert: add support for different encodings lars.schneider
2018-01-06  0:48 ` [PATCH v3 1/7] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-01-06  0:48 ` [PATCH v3 2/7] strbuf: add xstrdup_toupper() lars.schneider
2018-01-06  0:48 ` [PATCH v3 3/7] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-01-06  0:48 ` [PATCH v3 4/7] utf8: add function to detect a missing " lars.schneider
2018-01-06  0:48 ` [PATCH v3 5/7] convert_to_git(): safe_crlf/checksafe becomes int conv_flags lars.schneider
2018-01-08 21:28   ` Junio C Hamano
2018-01-08 22:47     ` Lars Schneider
2018-01-08 23:17       ` Junio C Hamano
2018-01-09  6:20       ` Torsten Bögershausen
2018-01-06  0:48 ` [PATCH v3 6/7] convert: add support for 'checkout-encoding' attribute lars.schneider
2018-01-06  0:48 ` [PATCH v3 7/7] convert: add tracing for checkout-encoding lars.schneider
2018-01-07  9:38 ` Torsten Bögershausen [this message]
2018-01-08 14:38   ` [PATCH v3 0/7] convert: add support for different encodings Lars Schneider
2018-01-08 18:08     ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180107093815.GA7442@tor.lan \
    --to=tboegi@web.de \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=lars.schneider@autodesk.com \
    --cc=larsxschneider@gmail.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.