public inbox for linux-msdos@vger.kernel.org
 help / color / mirror / Atom feed
* more cyrillic charsets
@ 2003-02-23 22:31 Grigory Batalov
  0 siblings, 0 replies; only message in thread
From: Grigory Batalov @ 2003-02-23 22:31 UTC (permalink / raw)
  To: linux-msdos

[-- Attachment #1: Type: text/plain, Size: 1928 bytes --]

  Hi!
  I'm asked about Ukrainian support in dosemu.
If I understand correctly (I'm not sure =)), we need specific
tables in extra_charsets.

  Main idea is to get specific characters that is not in koi8-r
already as they don't used by Russians.
  According to Roman Czyborra's great document
(http://czyborra.com/charsets/cyrillic.html) and to Unicode
layout (http://www.unicode.org/charts/PDF/U0400.pdf)
we can get next characters in same cp866 internal_charset:

8bit - Unicode
--------------
0xF2 - 0x0404 CYRILLIC CAPITAL LETTER UKRAINIAN IE
0xF3 - 0x0454 CYRILLIC SMALL LETTER UKRAINIAN IE
0xF4 - 0x0407 CYRILLIC CAPITAL LETTER YI
0xF5 - 0x0457 CYRILLIC SMALL LETTER YI

just with returning them back (they are substituted with
other helpful characters, but I think this is not correct).

  We miss
0x0491 CYRILLIC SMALL LETTER GHE WITH UPTURN
0x0490 CYRILLIC CAPITAL LETTER GHE WITH UPTURN
as there isn't place for this symbols in cp866
and
0xF6 - 0x040E CYRILLIC CAPITAL LETTER SHORT U
0xF7 - 0x045E CYRILLIC SMALL LETTER SHORT U
as there are not present in koi8-u charset
(but they are in cp1251 so this is a subject to discuss).

  Symbols
0x0406 CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
0x0456 CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
  in koi8-u can be mapped to
0x0049 LATIN CAPITAL LETTER I
0x0069 LATIN SMALL LETTER I
  as they are very similar and aren't present in cp866.

  I made needed tables for cp1251 and koi8-u with attached
Perl script. It generates exact copy of koi8-r.c so I think
it's correct =) (checked with 'diff -Bubi'). Koi8-u table
is corrected by hand a little.
  Tables are in attached patch. Also there is change for
two cyrillic vga fonts that make them able to display
symbols 0xF2, 0xF3, 0xF4, 0xF5. As I mentioned above it
would be nice to implement symbols 0xF6, 0xF7 there, but
I don't have request for them so not sure if they are
really needed.

-- 
 Grigory Batalov.

[-- Attachment #2: dosemu-1.1.4.13-extra_charsets.diff.bz2 --]
[-- Type: application/x-bzip2, Size: 2156 bytes --]

[-- Attachment #3: extra_charset.pl --]
[-- Type: text/plain, Size: 1320 bytes --]

#!/usr/bin/perl -w

use strict;
require Unicode::Map8;

my ($locale, $_locale);
$_locale = $locale = "koi8-r";
$_locale =~ s/-/_/g;

my $map = Unicode::Map8->new($locale);
my ($i, $j);


print "\x23include \"translate.h\"

static const t_unicode ${_locale}_c1_chars[] = {\n";

for ($i=0;$i<4;$i++)
   { for ($j=0;$j<8;$j++)
        { printf("0x%04X, ", $map->to_char16(0x80 + $i*8 + $j)); }
      printf("/* 0x%02X-0x%02X */\n", 0x80 + $i*8, 0x80 + $i*8 + 7);
    }

print "};
struct char_set ${_locale}_c1 = {
	1,
	CHARS(${_locale}_c1_chars),
	0, \"\", 0, 32,
};


static const t_unicode ${_locale}_g1_chars[] = {\n";

for ($i=0;$i<12;$i++)
   { for ($j=0;$j<8;$j++)
        { printf("0x%04X, ", $map->to_char16(0xA0 + $i*8 + $j)); }
     printf("/* 0x%02X-0x%02X */\n", 0xA0 + $i*8, 0xA0 + $i*8 + 7);
    }

print "};

struct char_set ${_locale}_g1 = {
	1,
	CHARS(${_locale}_g1_chars),
	0, \"\", 1, 96,
};

struct char_set $_locale = {
	.c0 = &ascii_c0,
	.g0 = &ascii_g0,
	.c1 = &${_locale}_c1,
	.g1 = &${_locale}_g1,
	.names = { \"$locale\", 0 },
};

struct char_set ${_locale}_safe = {
	.c0 = &ascii_c0,
	.g0 = &ascii_g0,
	.c1 = &ascii_c1,
	.g1 = &${_locale}_g1,
	.names = { \"${locale}-safe\", 0 },
};

CONSTRUCTOR(static void init(void))
{
	register_charset(&$_locale);
	register_charset(&${_locale}_safe);
}\n";


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2003-02-23 22:31 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-23 22:31 more cyrillic charsets Grigory Batalov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox