From: "Torsten Bögershausen" <tboegi@web.de>
To: lars.schneider@autodesk.com
Cc: git@vger.kernel.org, gitster@pobox.com, j6t@kdbg.org,
sunshine@sunshineco.com, peff@peff.net,
ramsay@ramsayjones.plus.com, Johannes.Schindelin@gmx.de,
pclouds@gmail.com, Lars Schneider <larsxschneider@gmail.com>
Subject: Re: [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute
Date: Sun, 18 Mar 2018 08:24:35 +0100 [thread overview]
Message-ID: <20180318072435.GA24190@tor.lan> (raw)
In-Reply-To: <20180309173536.62012-7-lars.schneider@autodesk.com>
Some comments inline
On Fri, Mar 09, 2018 at 06:35:32PM +0100, lars.schneider@autodesk.com wrote:
> From: Lars Schneider <larsxschneider@gmail.com>
>
> Git recognizes files encoded with ASCII or one of its supersets (e.g.
> UTF-8 or ISO-8859-1) as text files. All other encodings are usually
> interpreted as binary and consequently built-in Git text processing
> tools (e.g. 'git diff') as well as most Git web front ends do not
> visualize the content.
>
> Add an attribute to tell Git what encoding the user has defined for a
> given file. If the content is added to the index, then Git converts the
Minor comment:
"Git converts the content"
Everywhere else (?) "encodes or reencodes" is used.
"Git reencodes the content" may be more consistent.
[No comments on the .gitattributes]
>
> diff --git a/convert.c b/convert.c
> index b976eb968c..aa59ecfe49 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -7,6 +7,7 @@
> #include "sigchain.h"
> #include "pkt-line.h"
> #include "sub-process.h"
> +#include "utf8.h"
>
> /*
> * convert.c - convert a file when checking it out and checking it in.
> @@ -265,6 +266,78 @@ static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats,
>
> }
>
> +static const char *default_encoding = "UTF-8";
> +
> +static int encode_to_git(const char *path, const char *src, size_t src_len,
> + struct strbuf *buf, const char *enc, int conv_flags)
> +{
> + char *dst;
> + int dst_len;
> + int die_on_error = conv_flags & CONV_WRITE_OBJECT;
> +
> + /*
> + * No encoding is specified or there is nothing to encode.
> + * Tell the caller that the content was not modified.
> + */
> + if (!enc || (src && !src_len))
> + return 0;
(This may have been discussed before.
As we checked (enc != NULL) I think we can add here:)
if (is_encoding_utf8(enc))
return 0;
> +
> + /*
> + * Looks like we got called from "would_convert_to_git()".
> + * This means Git wants to know if it would encode (= modify!)
> + * the content. Let's answer with "yes", since an encoding was
> + * specified.
> + */
> + if (!buf && !src)
> + return 1;
> +
> + dst = reencode_string_len(src, src_len, default_encoding, enc,
> + &dst_len);
> + if (!dst) {
> + /*
> + * We could add the blob "as-is" to Git. However, on checkout
> + * we would try to reencode to the original encoding. This
> + * would fail and we would leave the user with a messed-up
> + * working tree. Let's try to avoid this by screaming loud.
> + */
> + const char* msg = _("failed to encode '%s' from %s to %s");
> + if (die_on_error)
> + die(msg, path, enc, default_encoding);
> + else {
> + error(msg, path, enc, default_encoding);
> + return 0;
> + }
> + }
> +
> + strbuf_attach(buf, dst, dst_len, dst_len + 1);
> + return 1;
> +}
> +
> +static int encode_to_worktree(const char *path, const char *src, size_t src_len,
> + struct strbuf *buf, const char *enc)
> +{
> + char *dst;
> + int dst_len;
> +
> + /*
> + * No encoding is specified or there is nothing to encode.
> + * Tell the caller that the content was not modified.
> + */
> + if (!enc || (src && !src_len))
> + return 0;
Same as above:
if (is_encoding_utf8(enc))
return 0;
> +
> + dst = reencode_string_len(src, src_len, enc, default_encoding,
> + &dst_len);
> + if (!dst) {
> + error("failed to encode '%s' from %s to %s",
> + path, default_encoding, enc);
> + return 0;
> + }
> +
> + strbuf_attach(buf, dst, dst_len, dst_len + 1);
> + return 1;
> +}
> +
> static int crlf_to_git(const struct index_state *istate,
> const char *path, const char *src, size_t len,
> struct strbuf *buf,
> @@ -978,6 +1051,25 @@ static int ident_to_worktree(const char *path, const char *src, size_t len,
> return 1;
> }
>
> +static const char *git_path_check_encoding(struct attr_check_item *check)
> +{
> + const char *value = check->value;
> +
> + if (ATTR_UNSET(value) || !strlen(value))
> + return NULL;
> +
> + if (ATTR_TRUE(value) || ATTR_FALSE(value)) {
> + error(_("working-tree-encoding attribute requires a value"));
> + return NULL;
> + }
TRUE or false are values, but just wrong ones.
If this test is removed, the user will see "failed to encode "TRUE" to "UTF-8",
which should give enough information to fix it.
> +
> + /* Don't encode to the default encoding */
> + if (!strcasecmp(value, default_encoding))
> + return NULL;
Same as above ?:
if (is_encoding_utf8(value))
return 0;
next prev parent reply other threads:[~2018-03-18 7:25 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-09 17:35 [PATCH v11 00/10] convert: add support for different encodings lars.schneider
2018-03-09 17:35 ` [PATCH v11 01/10] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-03-09 17:35 ` [PATCH v11 02/10] strbuf: add xstrdup_toupper() lars.schneider
2018-03-09 17:35 ` [PATCH v11 03/10] strbuf: add a case insensitive starts_with() lars.schneider
2018-03-09 17:35 ` [PATCH v11 04/10] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-03-09 17:35 ` [PATCH v11 05/10] utf8: add function to detect a missing " lars.schneider
2018-03-09 17:35 ` [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute lars.schneider
2018-03-09 19:10 ` Junio C Hamano
2018-03-15 21:23 ` Lars Schneider
2018-03-18 7:24 ` Torsten Bögershausen [this message]
2018-04-01 13:24 ` Lars Schneider
2018-04-05 16:41 ` Torsten Bögershausen
2018-04-15 16:54 ` Lars Schneider
2018-03-09 17:35 ` [PATCH v11 07/10] convert: check for detectable errors in UTF encodings lars.schneider
2018-03-09 19:00 ` Junio C Hamano
2018-03-09 19:04 ` Lars Schneider
2018-03-09 19:10 ` Junio C Hamano
2018-03-09 17:35 ` [PATCH v11 08/10] convert: advise canonical UTF encoding names lars.schneider
2018-03-09 19:11 ` Junio C Hamano
2018-03-15 22:42 ` Lars Schneider
2018-03-09 17:35 ` [PATCH v11 09/10] convert: add tracing for 'working-tree-encoding' attribute lars.schneider
2018-03-09 17:35 ` [PATCH v11 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding' lars.schneider
2018-03-09 20:18 ` Eric Sunshine
2018-03-09 20:22 ` Junio C Hamano
2018-03-09 20:27 ` Eric Sunshine
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180318072435.GA24190@tor.lan \
--to=tboegi@web.de \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=j6t@kdbg.org \
--cc=lars.schneider@autodesk.com \
--cc=larsxschneider@gmail.com \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
--cc=ramsay@ramsayjones.plus.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.