From: Karsten Blees <karsten.blees@gmail.com>
To: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Cc: "Carlos Martín Nieto" <cmn@elego.de>, "Jeff King" <peff@peff.net>
Subject: Re: [PATCH v2 0/4] UTF8 BOM follow-up
Date: Sat, 18 Apr 2015 00:44:50 +0200 [thread overview]
Message-ID: <55318CE2.1000706@gmail.com> (raw)
In-Reply-To: <1429209548-32297-1-git-send-email-gitster@pobox.com>
Am 16.04.2015 um 20:39 schrieb Junio C Hamano:
> This is on top of the ".gitignore can start with UTF8 BOM" patch
> from Carlos.
>
> Second try; the first patch is new to clarify the logic in the
> codeflow after Carlos's patch, and the second one has been adjusted
> accordingly.
>
> Junio C Hamano (4):
> add_excludes_from_file: clarify the bom skipping logic
> utf8-bom: introduce skip_utf8_bom() helper
> config: use utf8_bom[] from utf.[ch] in git_parse_source()
> attr: skip UTF8 BOM at the beginning of the input file
>
Wouldn't it be better to just strip the BOM on commit, e.g. via a clean filter or pre-commit hook (as suggested in [1])? Or is this patch series only meant to supplement such a solution (i.e. only strip the BOM when reading files from the working-copy rather than the committed tree)?
According to rfc3629 chapter 6 [2], the use of a BOM as encoding signature should be forbidden if the encoding is *known* to be always UTF-8. And .gitignore, .gitattributes and .gitmodules contain path names, which are always UTF-8 as of Git for Windows v1.7.10.
IOW, allowing a BOM would mean that files *without* BOM are *not* UTF-8 and need to be decoded from e.g. system encoding (which unfortunately cannot be set to UTF-8 on Windows). But this makes no sense as the repository would not be portable. E.g. a .gitattributes file created on a Greek Windows, containing greek path names in Cp1253, would not work on platforms with different encoding.
On the other hand, just ignoring the BOM (as this patch series does) leaves us with two alternative binary representations of the same content file...i.e. we'll eventually end up with spurious 1st line changes as users add / remove BOMs from committed .git[ignore|attributes|modules] files, depending on their editor preference...
For local files (.gitconfig, .git/info/exclude, .git/COMMIT_EDITMSG...), auto-detecting encoding based on the presence of a BOM makes somewhat more sense. However, this will most likely break editors that follow the recommendation of the Unicode specification ("Use of a BOM is neither required nor recommended for UTF-8" [3]). So we'd probably need a core.editorEncoding or core.editorUseBom setting to tell git whether "no BOM" means UTF-8 or system encoding...
Just as a reminder: we should update the Git for Windows Unicode document [4] if we improve support for BOM-adamant editors.
Cheers,
Karsten
[1] http://stackoverflow.com/questions/27223985/git-ignore-bom-prevent-git-diff-from-showing-byte-order-mark-changes
[2] https://tools.ietf.org/html/rfc3629
[3] http://www.unicode.org/versions/Unicode7.0.0/ch02.pdf p.40
[4] https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Support#editor
next prev parent reply other threads:[~2015-04-17 22:44 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-16 14:05 [PATCH] dir: allow a BOM at the beginning of exclude files Carlos Martín Nieto
2015-04-16 15:03 ` Johannes Schindelin
2015-04-16 15:09 ` Carlos Martín Nieto
2015-04-16 15:10 ` Carlos Martín Nieto
2015-04-16 15:39 ` Junio C Hamano
2015-04-16 15:55 ` Jeff King
2015-04-16 17:16 ` Junio C Hamano
2015-04-16 17:52 ` [PATCH 0/3] UTF8 BOM follow-up Junio C Hamano
2015-04-16 17:52 ` [PATCH 1/3] utf8-bom: introduce skip_utf8_bom() helper Junio C Hamano
2015-04-16 18:14 ` Jeff King
2015-04-16 18:23 ` Junio C Hamano
2015-04-16 17:52 ` [PATCH 2/3] config: use utf8_bom[] from utf.[ch] in git_parse_source() Junio C Hamano
2015-04-16 17:52 ` [PATCH 3/3] attr: skip UTF8 BOM at the beginning of the input file Junio C Hamano
2015-04-16 18:27 ` [PATCH] dir: allow a BOM at the beginning of exclude files Carlos Martín Nieto
2015-04-16 18:39 ` [PATCH v2 0/4] UTF8 BOM follow-up Junio C Hamano
2015-04-16 18:39 ` [PATCH v2 1/4] add_excludes_from_file: clarify the bom skipping logic Junio C Hamano
2015-04-16 18:39 ` [PATCH v2 2/4] utf8-bom: introduce skip_utf8_bom() helper Junio C Hamano
2015-04-16 18:39 ` [PATCH v2 3/4] config: use utf8_bom[] from utf.[ch] in git_parse_source() Junio C Hamano
2015-04-16 18:39 ` [PATCH v2 4/4] attr: skip UTF8 BOM at the beginning of the input file Junio C Hamano
2015-04-16 19:26 ` [PATCH v2 0/4] UTF8 BOM follow-up Jeff King
2015-04-17 22:44 ` Karsten Blees [this message]
2015-04-20 21:50 ` Junio C Hamano
2015-04-16 16:08 ` [PATCH] dir: allow a BOM at the beginning of exclude files Johannes Schindelin
2015-04-16 16:10 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55318CE2.1000706@gmail.com \
--to=karsten.blees@gmail.com \
--cc=cmn@elego.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).