From: ZelinskiyIS <ivze@bk.ru>
To: dash@vger.kernel.org
Subject: A probem with CTLESC, CTLQUOTEMARK and UTF-8.
Date: Mon, 07 Sep 2009 11:46:44 +0400 [thread overview]
Message-ID: <4AA4BA64.1080405@bk.ru> (raw)
Good day (or night)!
I am Ubuntu user, as for Jaunty 9.04 we have dash 0.5.4 installed as the
default sh interpreter. Ubuntu uses multibute UTF-8 to represent local
symbols, these symbols are often to be found in file names.
I found a bug, when trying to find out why a python script, doing some
little work of converting music files, would fail on songs with Cyrillic
names, containing letters с,ш,Ё. The reason was sh in system(...) call,
that created files with garbage in names when using "> $file_name"
redirection when $file_name contained these three letters.
For example, a sequence рсшЁъ (byte-by-byte)
{d1 80 d1 81 d1 88 d0 81 d1 8a}
is turned into
{d1 80 d1 d1 d0 d1 8a}. Bytes hex 81 and hex 88 disappear from the file
name.
The reason for such behaviour is in expand.c:239-240 for dash 0.5.4. The
lines and bug look similar in dash 0.5.5.1, here the place is
expand.c:216-217.
The piece of code:
########################################################################
if (flag & EXP_REDIR) /*XXX - for now, just remove escapes */
rmescapes(p);
########################################################################
cuts bytes x81 and x88. The behaviour seems to be allways unwanted,
because according to UTF-8 specifications, x81 and x88 can not represent
an individual symbol. Indeed, hex 81 = binary 10000001, hex 88 = binary
10001000; the upper two bits are 10, what means that the byte is
data-carrier and must always trail initiating byte (from
http://en.wikipedia.org/wiki/UTF-8#Description).
The problem, probably, do not occur when using a single-byte KOI8-R
encoding for Cyrillics, which is default for Debian.
I have also created a launchpad bug for Ubuntu,
https://bugs.launchpad.net/ubuntu/+source/dash/+bug/422298.
That's it, thanks for attention.
reply other threads:[~2009-09-07 7:56 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AA4BA64.1080405@bk.ru \
--to=ivze@bk.ru \
--cc=dash@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox