From: Kevin Bracey <kevin@bracey.fi>
To: "Torsten Bögershausen" <tboegi@web.de>
Cc: Peter Krefting <peter@softwolves.pp.se>, git@vger.kernel.org
Subject: Re: [PATCH] Unicode: update of combining code points
Date: Wed, 16 Apr 2014 13:51:43 +0300 [thread overview]
Message-ID: <534E60BF.5020602@bracey.fi> (raw)
In-Reply-To: <534E0B84.6070602@web.de>
On 16/04/2014 07:48, Torsten Bögershausen wrote:
> On 15.04.14 21:10, Peter Krefting wrote:
>> Torsten Bögershausen:
>>
>>> diff --git a/utf8.c b/utf8.c
>>> index a831d50..77c28d4 100644
>>> --- a/utf8.c
>>> +++ b/utf8.c
>> Is there a script that generates this code from the Unicode database files, or did you hand-update it?
>>
> Some of the code points which have "0 length on the display" are called
> "combining", others are called "vowels" or "accents".
> E.g. 5BF is not marked any of them, but if you look at the glyph, it should
> be combining (please correct me if that is wrong).
Indeed it is combining (more specifically it has General Category
"Nonspacing_Mark" = "Mn").
>
> If I could have found a file which indicates for each code point, what it
> is, I could write a script.
>
The most complete and machine-readable data are in these files:
http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
The general categories can also be seen more legibly in:
http://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedGeneralCategory.txt
For docs, see:
http://www.unicode.org/reports/tr44/
http://www.unicode.org/reports/tr11/
http://www.unicode.org/ucd/
The existing utf8.c comments describe the attributes being selected from
the tables (general categories "Cf","Mn","Me", East Asian Width "W",
"F"). And they suggest that the combining character table was originally
auto-generated from UnicodeData.txt with a "uniset" tool. Presumably this?
https://github.com/depp/uniset
The fullwidth-checking code looks like it was done by hand, although
apparently uniset can process EastAsianWidth.txt.
Kevin
next prev parent reply other threads:[~2014-04-16 11:08 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-07 19:30 [PATCH] Unicode: update of combining code points Torsten Bögershausen
2014-04-15 19:10 ` Peter Krefting
2014-04-16 4:48 ` Torsten Bögershausen
2014-04-16 10:51 ` Kevin Bracey [this message]
2014-04-16 19:58 ` Torsten Bögershausen
2014-04-17 6:32 ` Kevin Bracey
2014-04-24 9:02 ` Peter Krefting
-- strict thread matches above, loose matches on Subject: below --
2014-04-07 19:34 Torsten Bögershausen
2014-04-07 19:38 Torsten Bögershausen
2014-04-07 19:39 Torsten Bögershausen
2014-04-07 19:54 ` Jonathan Nieder
2014-04-08 22:37 ` Junio C Hamano
2014-04-09 16:48 ` Torsten Bögershausen
2014-04-09 17:30 ` Junio C Hamano
2014-04-10 4:12 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=534E60BF.5020602@bracey.fi \
--to=kevin@bracey.fi \
--cc=git@vger.kernel.org \
--cc=peter@softwolves.pp.se \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).