From: Kevin Bracey <kevin@bracey.fi>
To: "Torsten Bögershausen" <tboegi@web.de>
Cc: Peter Krefting <peter@softwolves.pp.se>, git@vger.kernel.org
Subject: Re: [PATCH] Unicode: update of combining code points
Date: Thu, 17 Apr 2014 09:32:34 +0300 [thread overview]
Message-ID: <534F7582.9090303@bracey.fi> (raw)
In-Reply-To: <534EE0E7.2030608@web.de>
On 16/04/2014 22:58, Torsten Bögershausen wrote:
> Excellent, thanks for the pointers.
> Running the script below shows that
> "0X00AD SOFT HYPHEN" should have zero length (and some others too).
> I wonder if that is really the case, and which one of the last 2 lines
> in the script is the right one.
>
> What does this mean for us:
> "Cf Format a format control character"
>
Maybe dig back through the Git logs to check the original logic, but the
comments suggest that "Cf" characters have been viewed as zero-width.
That makes sense - they're usually markers indicating things like
bidirectional text flow, so won't be taking space. (Although they may be
causing even more extreme layout effects...)
Soft-hyphen is noted as an explicit exception to the rule in the utf8.c
comments. As of Unicode 4.0, it's supposed to be a character indicating
a point where a hyphen could be placed if a line-wrap occurs, and if
that wrap happens, then it can actually take up 1 space, otherwise not.
So its width could be either 0 or 1, depending. Or, quite likely, the
terminal doesn't treat it specially, and it always just looks like a
hyphen... Thus we err on the safe side and give it width 1.
See http://en.wikipedia.org/wiki/Soft_hyphen for background.
The comments suggest adding "-00AD +1160-11FF" to the uniset command
line for that tweak and for composing Hangul. (The +200B tweak isn't
necessary any more - Zero-Width Space U+200B became Cf officially in
Unicode 4.0.1:
http://en.wikipedia.org/wiki/Zero-width_space
http://www.unicode.org/review/resolved-pri.html#pri21
)
All of this is only really an approximation - a best-effort attempt to
figure out the width of a string without any actual communication with
the display device. So it'll never be perfect. The choice between double
and single width in particular will often be unpredictable, unless you
had deeper locale knowledge.
Actually, while doing this, I've realised that this was originally
Markus Kuhn's implementation, and that is acknowledged at the top of the
file:
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
Good, because he knows what he's doing.
Kevin
next prev parent reply other threads:[~2014-04-17 6:48 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-07 19:30 [PATCH] Unicode: update of combining code points Torsten Bögershausen
2014-04-15 19:10 ` Peter Krefting
2014-04-16 4:48 ` Torsten Bögershausen
2014-04-16 10:51 ` Kevin Bracey
2014-04-16 19:58 ` Torsten Bögershausen
2014-04-17 6:32 ` Kevin Bracey [this message]
2014-04-24 9:02 ` Peter Krefting
-- strict thread matches above, loose matches on Subject: below --
2014-04-07 19:34 Torsten Bögershausen
2014-04-07 19:38 Torsten Bögershausen
2014-04-07 19:39 Torsten Bögershausen
2014-04-07 19:54 ` Jonathan Nieder
2014-04-08 22:37 ` Junio C Hamano
2014-04-09 16:48 ` Torsten Bögershausen
2014-04-09 17:30 ` Junio C Hamano
2014-04-10 4:12 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=534F7582.9090303@bracey.fi \
--to=kevin@bracey.fi \
--cc=git@vger.kernel.org \
--cc=peter@softwolves.pp.se \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).