From: Dmitry Potapov <dpotapov@gmail.com>
To: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
Cc: Robert Buck <buck.robert.j@gmail.com>,
"git@vger.kernel.org List" <git@vger.kernel.org>,
msysGit <msysgit@googlegroups.com>
Subject: utf8 BOM
Date: Fri, 14 May 2010 14:16:48 +0400 [thread overview]
Message-ID: <20100514101648.GB6212@dpotapov.dyndns.org> (raw)
In-Reply-To: <014C9B00-800C-465D-A0B9-98BEEB7D7A96@gmail.com>
On Thu, May 13, 2010 at 01:47:45PM +0200, Eyvind Bernhardsen wrote:
>
> I just did a quick test with a plain text file; it was detected as
> text both with and without a utf8 BOM. Looking at the code,
> characters >= 128 are considered printable so the BOM shouldn't make
> any difference at all. Do you have an example utf8 text file that is
> misdetected as binary?
Though UTF-8 BOM does not present any problem for automatic text
detector, it is another piece from Microsoft that creates some
interoperability issues when you work with non-ASCII text files.
In short:
1. Microsoft editors and tools like to add utf8 BOM to files, and
you cannot turn this behavior off.
2. Many tools (such as Microsoft compiler) incapable to recognize
UTF-8 files without BOM, so they screw up all non-ASCII chars.
#1 is a problem, because it creates changes consisting solely of adding
utf8 BOM. Moreover, users of non-Windows platforms are not exactly
thrilled with having utf8 BOM at the beginning of every text file.
Probably, ability of automatic add utf8 BOM on Windows to text files
(which are marked as "unicode") can be helpful, but it is just a part
of the problem of how to deal with text files in "legacy" encoding,
which are still widely used on Windows.
Dmitry
next prev parent reply other threads:[~2010-05-14 10:17 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-12 23:00 [PATCH v3 0/5] End-of-line normalization, redesigned Eyvind Bernhardsen
2010-05-12 23:00 ` [PATCH v3 1/5] autocrlf: Make it work also for un-normalized repositories Eyvind Bernhardsen
2010-05-12 23:00 ` [PATCH v3 2/5] Add tests for per-repository eol normalization Eyvind Bernhardsen
2010-05-12 23:00 ` [PATCH v3 3/5] Add " Eyvind Bernhardsen
2010-05-12 23:00 ` [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv" Eyvind Bernhardsen
2010-05-13 1:38 ` Linus Torvalds
2010-05-13 9:39 ` Robert Buck
2010-05-13 9:58 ` Robert Buck
2010-05-13 11:47 ` Eyvind Bernhardsen
2010-05-13 13:19 ` Robert Buck
2010-05-14 10:16 ` Dmitry Potapov [this message]
2010-05-15 20:23 ` utf8 BOM Eyvind Bernhardsen
2010-05-16 5:19 ` Dmitry Potapov
2010-05-16 10:37 ` Eyvind Bernhardsen
2010-05-16 11:26 ` Tait
2010-05-16 13:32 ` Dmitry Potapov
2010-05-13 10:59 ` [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv" Eyvind Bernhardsen
2010-05-13 21:45 ` Linus Torvalds
2010-05-14 2:34 ` Robert Buck
2010-05-14 4:56 ` Jonathan Nieder
2010-05-14 21:21 ` Eyvind Bernhardsen
2010-05-14 21:32 ` Eyvind Bernhardsen
2010-05-14 21:16 ` Eyvind Bernhardsen
2010-05-14 21:27 ` Linus Torvalds
2010-05-15 20:47 ` [PATCH] Add "core.eol" variable to control end-of-line conversion Eyvind Bernhardsen
2010-05-16 10:39 ` Robert Buck
2010-05-12 23:00 ` [RFC/PATCH v3 5/5] Rename "core.autocrlf" config variable as "core.eolconv" Eyvind Bernhardsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100514101648.GB6212@dpotapov.dyndns.org \
--to=dpotapov@gmail.com \
--cc=buck.robert.j@gmail.com \
--cc=eyvind.bernhardsen@gmail.com \
--cc=git@vger.kernel.org \
--cc=msysgit@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).