From: "H. Peter Anvin" <hpa@zytor.com>
To: David Newall <davidn@davidnewall.com>
Cc: Jan Engelhardt <jengelh@computergmbh.de>,
"John T." <j.thomast@yahoo.com>,
linux-kernel@vger.kernel.org
Subject: Re: UTF-8 and Alt key in the console
Date: Tue, 01 Apr 2008 17:38:40 -0700 [thread overview]
Message-ID: <47F2D590.2040300@zytor.com> (raw)
In-Reply-To: <47F2CD1B.7060804@davidnewall.com>
David Newall wrote:
> Jan Engelhardt wrote:
>> Hence the proposal of using definite start and end markers:
>>
>> echo -e '\x1B43m\x1D wonderful \x1B0m\x1D' | cosmicrays | cat
>
> I see no merit in the idea. Most seriously, there isn't any real-world
> problem being solved. In addition, it proposes creating yet another
> type of terminal emulation. If there's something you don't like about
> VT escape codes, use a different emulation. For example, Televideo
> terminals used almost exclusively single-character control codes,
> reducing the scope of being mid-sequence to, well much closer to zero.
>
> You need to make quite clear that your proposal is to discontinue use of
> VT terminal emulation.
Okay, let's put this to rest once and for all:
*** ISO 6429 sequences are self-terminating. ***
No, you can't tell you're inside one if you miss the leading CSI, but as
has been pointed out, there really isn't a huge case for it.
The standard is available for free under the name ECMA-48:
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-048.pdf
It references ISO 2022, a.k.a. ECMA-35:
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-035.pdf
These standards use a decimalized hexadecimal notation, so if you see
"05/10" it means 0x5a. A "column" refers to a 16-character set, so
"column 4" refers to bytes 0x40 to 0x4f.
The structure defined in section 5.4 of ISO 6429/ECMA-48:
-----------
5.4 Control sequences
A control sequence is a string of bit combinations starting with the
control function CONTROL SEQUENCE INTRODUCER (CSI) followed by one or
more bit combinations representing parameters, if any, and by one or
more bit combinations identifying the control function. The control
function CSI itself is an element of the C1 set.
The format of a control sequence is
CSI P ... P I ... I F
where
a) CSI is represented by bit combinations 01/11 (representing ESC) and
05/11 in a 7-bit code or by bit combination 09/11 in an 8-bit code, see 5.3;
b) P ... P are Parameter Bytes, which, if present, consist of bit
combinations from 03/00 to 03/15;
c) I ... I are Intermediate Bytes, which, if present, consist of bit
combinations from 02/00 to 02/15. Together with the Final Byte F, they
identify the control function;
NOTE The number of Intermediate Bytes is not limited by this Standard;
in practice, one Intermediate Byte will be sufficient since with sixteen
different bit combinations available for the Intermediate Byte over one
thousand control functions may be identified.
d) F is the Final Byte; it consists of a bit combination from 04/00 to
07/14; it terminates the control sequence and together with the
Intermediate Bytes, if present, identifies the control function. Bit
combinations 07/00 to 07/14 are available as Final Bytes of control
sequences for private (or experimental) use.
-----------
Note: DEC added nonstandard control sequences initiated with SS3 (ESC O)
as well as CSI (ESC [); otherwise they use the same format.
The Final Byte is easy enough to spot, as writing a generic parser which
can pick this apart, including parameter handling.
-hpa
next prev parent reply other threads:[~2008-04-02 0:39 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-23 15:15 UTF-8 and Alt key in the console John T.
2008-03-23 15:29 ` Jan Engelhardt
2008-03-23 15:46 ` John T.
2008-03-23 16:54 ` H. Peter Anvin
2008-03-23 17:47 ` John T.
2008-03-23 17:55 ` H. Peter Anvin
2008-03-23 18:13 ` John T.
2008-03-23 18:46 ` Jan Engelhardt
2008-03-28 23:26 ` H. Peter Anvin
2008-03-29 0:07 ` Jan Engelhardt
2008-03-29 0:23 ` H. Peter Anvin
2008-03-29 0:44 ` Jan Engelhardt
2008-03-29 1:07 ` H. Peter Anvin
2008-03-29 6:33 ` David Newall
2008-03-29 17:05 ` H. Peter Anvin
2008-04-01 20:13 ` Jan Engelhardt
2008-04-01 20:22 ` H. Peter Anvin
2008-04-02 0:02 ` David Newall
2008-04-02 0:38 ` H. Peter Anvin [this message]
2008-04-06 8:46 ` Marko Macek
2008-04-06 10:14 ` David Newall
2008-04-06 16:37 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47F2D590.2040300@zytor.com \
--to=hpa@zytor.com \
--cc=davidn@davidnewall.com \
--cc=j.thomast@yahoo.com \
--cc=jengelh@computergmbh.de \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.