* [PATCH] UTF-8 to UTF-16 transformation @ 2009-08-24 19:22 Vladimir 'phcoder' Serbinenko 2009-08-24 19:23 ` Vladimir 'phcoder' Serbinenko 0 siblings, 1 reply; 9+ messages in thread From: Vladimir 'phcoder' Serbinenko @ 2009-08-24 19:22 UTC (permalink / raw) To: The development of GRUB 2 Splitted from my newtree patch -- Regards Vladimir 'phcoder' Serbinenko Personal git repository: http://repo.or.cz/w/grub2/phcoder.git ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] UTF-8 to UTF-16 transformation 2009-08-24 19:22 [PATCH] UTF-8 to UTF-16 transformation Vladimir 'phcoder' Serbinenko @ 2009-08-24 19:23 ` Vladimir 'phcoder' Serbinenko 2009-08-26 0:31 ` Robert Millan 0 siblings, 1 reply; 9+ messages in thread From: Vladimir 'phcoder' Serbinenko @ 2009-08-24 19:23 UTC (permalink / raw) To: The development of GRUB 2 [-- Attachment #1: Type: text/plain, Size: 349 bytes --] On Mon, Aug 24, 2009 at 9:22 PM, Vladimir 'phcoder' Serbinenko<phcoder@gmail.com> wrote: > Splitted from my newtree patch > > -- > Regards > Vladimir 'phcoder' Serbinenko > > Personal git repository: http://repo.or.cz/w/grub2/phcoder.git > -- Regards Vladimir 'phcoder' Serbinenko Personal git repository: http://repo.or.cz/w/grub2/phcoder.git [-- Attachment #2: utf.diff --] [-- Type: text/plain, Size: 4878 bytes --] 2009-08-24 Vladimir Serbinenko <phcoder@gmail.com> UTF-8 to UTF-16 transformation. * conf/common.rmk (pkglib_MODULES): Add utf.mod (utf_mod_SOURCES): New variable. (utf_mod_CFLAGS): Likewise. (utf_mod_LDFLAGS): Likewise. * include/grub/utf.h: New file. * lib/utf.c: New file. (Based on grub_utf8_to_ucs4 from kern/misc.c) diff --git a/conf/common.rmk b/conf/common.rmk index b0d3785..b5a6048 100644 --- a/conf/common.rmk +++ b/conf/common.rmk @@ -617,3 +617,8 @@ pkglib_MODULES += setjmp.mod setjmp_mod_SOURCES = lib/$(target_cpu)/setjmp.S setjmp_mod_ASFLAGS = $(COMMON_ASFLAGS) setjmp_mod_LDFLAGS = $(COMMON_LDFLAGS) + +pkglib_MODULES += utf.mod +utf_mod_SOURCES = lib/utf.c +utf_mod_CFLAGS = $(COMMON_CFLAGS) +utf_mod_LDFLAGS = $(COMMON_LDFLAGS) diff --git a/include/grub/utf.h b/include/grub/utf.h new file mode 100644 index 0000000..2091916 --- /dev/null +++ b/include/grub/utf.h @@ -0,0 +1,29 @@ +/* + * GRUB -- GRand Unified Bootloader + * Copyright (C) 2009 Free Software Foundation, Inc. + * + * GRUB is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * GRUB is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GRUB. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef GRUB_UTF_HEADER +#define GRUB_UTF_HEADER 1 + +#include <grub/types.h> + +grub_ssize_t +grub_utf8_to_utf16 (grub_uint16_t *dest, grub_size_t destsize, + const grub_uint8_t *src, grub_size_t srcsize, + const grub_uint8_t **srcend); + +#endif diff --git a/lib/utf.c b/lib/utf.c new file mode 100644 index 0000000..1f89f2f --- /dev/null +++ b/lib/utf.c @@ -0,0 +1,116 @@ +/* + * GRUB -- GRand Unified Bootloader + * Copyright (C) 1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009 Free Software Foundation, Inc. + * + * GRUB is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * GRUB is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GRUB. If not, see <http://www.gnu.org/licenses/>. + */ + +/* Convert a (possibly null-terminated) UTF-8 string of at most SRCSIZE + bytes (if SRCSIZE is -1, it is ignored) in length to a UTF-16 string. + Return the number of characters converted. DEST must be able to hold + at least DESTSIZE characters. If an invalid sequence is found, return -1. + If SRCEND is not NULL, then *SRCEND is set to the next byte after the + last byte used in SRC. */ + +#include <grub/utf.h> + +grub_ssize_t +grub_utf8_to_utf16 (grub_uint16_t *dest, grub_size_t destsize, + const grub_uint8_t *src, grub_size_t srcsize, + const grub_uint8_t **srcend) +{ + grub_uint16_t *p = dest; + int count = 0; + grub_uint32_t code = 0; + + if (srcend) + *srcend = src; + + while (srcsize && destsize) + { + grub_uint32_t c = *src++; + if (srcsize != (grub_size_t)-1) + srcsize--; + if (count) + { + if ((c & 0xc0) != 0x80) + { + /* invalid */ + return -1; + } + else + { + code <<= 6; + code |= (c & 0x3f); + count--; + } + } + else + { + if (c == 0) + break; + + if ((c & 0x80) == 0x00) + code = c; + else if ((c & 0xe0) == 0xc0) + { + count = 1; + code = c & 0x1f; + } + else if ((c & 0xf0) == 0xe0) + { + count = 2; + code = c & 0x0f; + } + else if ((c & 0xf8) == 0xf0) + { + count = 3; + code = c & 0x07; + } + else if ((c & 0xfc) == 0xf8) + { + count = 4; + code = c & 0x03; + } + else if ((c & 0xfe) == 0xfc) + { + count = 5; + code = c & 0x01; + } + else + return -1; + } + + if (count == 0) + { + if (destsize < 2 && code > 0x10000) + break; + if (code > 0x10000) + { + *p++ = 0xD800 + (((code - 0x10000) >> 12) & 0xfff); + *p++ = 0xDC00 + ((code - 0x10000) & 0xfff); + destsize -= 2; + } + else + { + *p++ = code; + destsize--; + } + } + } + + if (srcend) + *srcend = src; + return p - dest; +} ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] UTF-8 to UTF-16 transformation 2009-08-24 19:23 ` Vladimir 'phcoder' Serbinenko @ 2009-08-26 0:31 ` Robert Millan 2009-08-26 23:27 ` Joe Auricchio 2009-08-27 21:31 ` Vladimir 'phcoder' Serbinenko 0 siblings, 2 replies; 9+ messages in thread From: Robert Millan @ 2009-08-26 0:31 UTC (permalink / raw) To: The development of GRUB 2 On Mon, Aug 24, 2009 at 09:23:22PM +0200, Vladimir 'phcoder' Serbinenko wrote: > 2009-08-24 Vladimir Serbinenko <phcoder@gmail.com> > > UTF-8 to UTF-16 transformation. > > * conf/common.rmk (pkglib_MODULES): Add utf.mod > (utf_mod_SOURCES): New variable. > (utf_mod_CFLAGS): Likewise. > (utf_mod_LDFLAGS): Likewise. > * include/grub/utf.h: New file. > * lib/utf.c: New file. (Based on grub_utf8_to_ucs4 from kern/misc.c) Sounds like we could end up needing more of this (to other charsets), so why not give this module a generic name to hint as to where it can be added? The conversion functions in kern/misc.c could eventually move there as well, once UTF-8 support becomes optional in the kernel. GNU libc has "iconv" command and "iconv_*" facilities for charset conversion, how about iconv.mod for consistency? > + if ((c & 0x80) == 0x00) > + code = c; > + else if ((c & 0xe0) == 0xc0) These should be macroified. -- Robert Millan The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and how) you may access your data; but nobody's threatening your freedom: we still allow you to remove your data and not access it at all." ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] UTF-8 to UTF-16 transformation 2009-08-26 0:31 ` Robert Millan @ 2009-08-26 23:27 ` Joe Auricchio 2009-08-27 21:31 ` Vladimir 'phcoder' Serbinenko 1 sibling, 0 replies; 9+ messages in thread From: Joe Auricchio @ 2009-08-26 23:27 UTC (permalink / raw) To: The development of GRUB 2 On Tue, Aug 25, 2009 at 17:31, Robert Millan<rmh@aybabtu.com> wrote: > GNU libc has "iconv" command and "iconv_*" facilities for charset conversion, > how about iconv.mod for consistency? My 2 cents is: that might be confusing. At least, if I saw iconv.mod, I would assume it was really GNU libc's iconv library. char_conv.mod? charset_conv.mod? charset.mod? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] UTF-8 to UTF-16 transformation 2009-08-26 0:31 ` Robert Millan 2009-08-26 23:27 ` Joe Auricchio @ 2009-08-27 21:31 ` Vladimir 'phcoder' Serbinenko 2009-08-27 22:11 ` Installing Solaris without a CDROM Seth Goldberg ` (2 more replies) 1 sibling, 3 replies; 9+ messages in thread From: Vladimir 'phcoder' Serbinenko @ 2009-08-27 21:31 UTC (permalink / raw) To: The development of GRUB 2 On Wed, Aug 26, 2009 at 2:31 AM, Robert Millan<rmh@aybabtu.com> wrote: > On Mon, Aug 24, 2009 at 09:23:22PM +0200, Vladimir 'phcoder' Serbinenko wrote: > >> 2009-08-24 Vladimir Serbinenko <phcoder@gmail.com> >> >> UTF-8 to UTF-16 transformation. >> >> * conf/common.rmk (pkglib_MODULES): Add utf.mod >> (utf_mod_SOURCES): New variable. >> (utf_mod_CFLAGS): Likewise. >> (utf_mod_LDFLAGS): Likewise. >> * include/grub/utf.h: New file. >> * lib/utf.c: New file. (Based on grub_utf8_to_ucs4 from kern/misc.c) > > Sounds like we could end up needing more of this (to other charsets), so > why not give this module a generic name to hint as to where it can be added? > I'm ok with renaming but whether a conversion goes to charset.mod is perhaps to be decided on case-by-case basis- > The conversion functions in kern/misc.c could eventually move there as well, > once UTF-8 support becomes optional in the kernel. utf16_to_utf8 can be moved now out of the kernel but it's used by some fs modules (e.g. fat). Perhaps utf16_to_utf8 should be a separate module? This would decrease the size of biggest cores with the price of its increase in smaller cores. > > GNU libc has "iconv" command and "iconv_*" facilities for charset conversion, > how about iconv.mod for consistency? > >> + if ((c & 0x80) == 0x00) >> + code = c; >> + else if ((c & 0xe0) == 0xc0) > > These should be macroified. > Actually this are accelerated bitchecks (bit numbers follow specific and easy pattern) and for real readability would have to be written in binary but AFAIK binary notation isn't supported in C code and would result in overly long strings > -- > Robert Millan > > The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and > how) you may access your data; but nobody's threatening your freedom: we > still allow you to remove your data and not access it at all." > > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > http://lists.gnu.org/mailman/listinfo/grub-devel > -- Regards Vladimir 'phcoder' Serbinenko Personal git repository: http://repo.or.cz/w/grub2/phcoder.git ^ permalink raw reply [flat|nested] 9+ messages in thread
* Installing Solaris without a CDROM 2009-08-27 21:31 ` Vladimir 'phcoder' Serbinenko @ 2009-08-27 22:11 ` Seth Goldberg 2009-08-28 13:21 ` [PATCH] UTF-8 to UTF-16 transformation Vladimir 'phcoder' Serbinenko 2009-08-28 16:21 ` Robert Millan 2 siblings, 0 replies; 9+ messages in thread From: Seth Goldberg @ 2009-08-27 22:11 UTC (permalink / raw) To: The development of GRUB 2 Hi, You asked about it -- yes, there is. You can install it via a USB storage device (if your BIOS can boot from it). Take a look at this: http://opensolaris.org/jive/thread.jspa?threadID=107723&tstart=45 --S ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] UTF-8 to UTF-16 transformation 2009-08-27 21:31 ` Vladimir 'phcoder' Serbinenko 2009-08-27 22:11 ` Installing Solaris without a CDROM Seth Goldberg @ 2009-08-28 13:21 ` Vladimir 'phcoder' Serbinenko 2009-08-28 16:21 ` Robert Millan 2 siblings, 0 replies; 9+ messages in thread From: Vladimir 'phcoder' Serbinenko @ 2009-08-28 13:21 UTC (permalink / raw) To: The development of GRUB 2 [-- Attachment #1: Type: text/plain, Size: 2540 bytes --] On Thu, Aug 27, 2009 at 11:31 PM, Vladimir 'phcoder' Serbinenko<phcoder@gmail.com> wrote: > On Wed, Aug 26, 2009 at 2:31 AM, Robert Millan<rmh@aybabtu.com> wrote: >> On Mon, Aug 24, 2009 at 09:23:22PM +0200, Vladimir 'phcoder' Serbinenko wrote: >> >>> 2009-08-24 Vladimir Serbinenko <phcoder@gmail.com> >>> >>> UTF-8 to UTF-16 transformation. >>> >>> * conf/common.rmk (pkglib_MODULES): Add utf.mod >>> (utf_mod_SOURCES): New variable. >>> (utf_mod_CFLAGS): Likewise. >>> (utf_mod_LDFLAGS): Likewise. >>> * include/grub/utf.h: New file. >>> * lib/utf.c: New file. (Based on grub_utf8_to_ucs4 from kern/misc.c) >> >> Sounds like we could end up needing more of this (to other charsets), so >> why not give this module a generic name to hint as to where it can be added? >> > I'm ok with renaming but whether a conversion goes to charset.mod is > perhaps to be decided on case-by-case basis- >> The conversion functions in kern/misc.c could eventually move there as well, >> once UTF-8 support becomes optional in the kernel. > utf16_to_utf8 can be moved now out of the kernel but it's used by some > fs modules (e.g. fat). Perhaps utf16_to_utf8 should be a separate > module? This would decrease the size of biggest cores with the price > of its increase in smaller cores. >> >> GNU libc has "iconv" command and "iconv_*" facilities for charset conversion, >> how about iconv.mod for consistency? >> >>> + if ((c & 0x80) == 0x00) >>> + code = c; >>> + else if ((c & 0xe0) == 0xc0) >> >> These should be macroified. >> > Actually this are accelerated bitchecks (bit numbers follow specific > and easy pattern) and for real readability would have to be written in > binary but AFAIK binary notation isn't supported in C code and would > result in overly long strings >> -- >> Robert Millan >> >> The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and >> how) you may access your data; but nobody's threatening your freedom: we >> still allow you to remove your data and not access it at all." >> >> >> _______________________________________________ >> Grub-devel mailing list >> Grub-devel@gnu.org >> http://lists.gnu.org/mailman/listinfo/grub-devel >> > > > > -- > Regards > Vladimir 'phcoder' Serbinenko > > Personal git repository: http://repo.or.cz/w/grub2/phcoder.git > -- Regards Vladimir 'phcoder' Serbinenko Personal git repository: http://repo.or.cz/w/grub2/phcoder.git [-- Attachment #2: utf.diff --] [-- Type: text/plain, Size: 6422 bytes --] diff --git a/ChangeLog b/ChangeLog index ab542e2..367ab05 100644 --- a/ChangeLog +++ b/ChangeLog @@ -158,6 +158,17 @@ 2009-08-24 Vladimir Serbinenko <phcoder@gmail.com> + UTF-8 to UTF-16 transformation. + + * conf/common.rmk (pkglib_MODULES): Add utf.mod + (utf_mod_SOURCES): New variable. + (utf_mod_CFLAGS): Likewise. + (utf_mod_LDFLAGS): Likewise. + * include/grub/utf.h: New file. + * lib/utf.c: New file. (Based on grub_utf8_to_ucs4 from kern/misc.c) + +2009-08-24 Vladimir Serbinenko <phcoder@gmail.com> + * script/sh/function.c (grub_script_function_find): Cut error message not to flood terminal. * script/sh/lexer.c (grub_script_yylex): Remove command line length diff --git a/conf/common.rmk b/conf/common.rmk index 7727f19..735e57a 100644 --- a/conf/common.rmk +++ b/conf/common.rmk @@ -633,3 +633,8 @@ pkglib_MODULES += setjmp.mod setjmp_mod_SOURCES = lib/$(target_cpu)/setjmp.S setjmp_mod_ASFLAGS = $(COMMON_ASFLAGS) setjmp_mod_LDFLAGS = $(COMMON_LDFLAGS) + +pkglib_MODULES += charset.mod +charset_mod_SOURCES = lib/charset.c +charset_mod_CFLAGS = $(COMMON_CFLAGS) +charset_mod_LDFLAGS = $(COMMON_LDFLAGS) diff --git a/include/grub/charset.h b/include/grub/charset.h new file mode 100644 index 0000000..22b6724 --- /dev/null +++ b/include/grub/charset.h @@ -0,0 +1,50 @@ +/* + * GRUB -- GRand Unified Bootloader + * Copyright (C) 2009 Free Software Foundation, Inc. + * + * GRUB is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * GRUB is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GRUB. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef GRUB_CHARSET_HEADER +#define GRUB_CHARSET_HEADER 1 + +#include <grub/types.h> + +#define GRUB_UINT8_1_LEADINGBIT 0x80 +#define GRUB_UINT8_2_LEADINGBITS 0xc0 +#define GRUB_UINT8_3_LEADINGBITS 0xe0 +#define GRUB_UINT8_4_LEADINGBITS 0xf0 +#define GRUB_UINT8_5_LEADINGBITS 0xf8 +#define GRUB_UINT8_6_LEADINGBITS 0xfc +#define GRUB_UINT8_7_LEADINGBITS 0xfe + +#define GRUB_UINT8_1_TRAILINGBIT 0x01 +#define GRUB_UINT8_2_TRAILINGBITS 0x03 +#define GRUB_UINT8_3_TRAILINGBITS 0x07 +#define GRUB_UINT8_4_TRAILINGBITS 0x0f +#define GRUB_UINT8_5_TRAILINGBITS 0x1f +#define GRUB_UINT8_6_TRAILINGBITS 0x3f + +#define GRUB_UCS2_LIMIT 0x10000 +#define GRUB_UTF16_UPPER_SURROGATE(code) \ + (0xD800 + ((((code) - GRUB_UCS2_LIMIT) >> 12) & 0xfff)) +#define GRUB_UTF16_LOWER_SURROGATE(code) \ + (0xDC00 + (((code) - GRUB_UCS2_LIMIT) & 0xfff)) + +grub_ssize_t +grub_utf8_to_utf16 (grub_uint16_t *dest, grub_size_t destsize, + const grub_uint8_t *src, grub_size_t srcsize, + const grub_uint8_t **srcend); + +#endif diff --git a/lib/charset.c b/lib/charset.c new file mode 100644 index 0000000..8bc5b91 --- /dev/null +++ b/lib/charset.c @@ -0,0 +1,116 @@ +/* + * GRUB -- GRand Unified Bootloader + * Copyright (C) 1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009 Free Software Foundation, Inc. + * + * GRUB is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * GRUB is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GRUB. If not, see <http://www.gnu.org/licenses/>. + */ + +/* Convert a (possibly null-terminated) UTF-8 string of at most SRCSIZE + bytes (if SRCSIZE is -1, it is ignored) in length to a UTF-16 string. + Return the number of characters converted. DEST must be able to hold + at least DESTSIZE characters. If an invalid sequence is found, return -1. + If SRCEND is not NULL, then *SRCEND is set to the next byte after the + last byte used in SRC. */ + +#include <grub/charset.h> + +grub_ssize_t +grub_utf8_to_utf16 (grub_uint16_t *dest, grub_size_t destsize, + const grub_uint8_t *src, grub_size_t srcsize, + const grub_uint8_t **srcend) +{ + grub_uint16_t *p = dest; + int count = 0; + grub_uint32_t code = 0; + + if (srcend) + *srcend = src; + + while (srcsize && destsize) + { + grub_uint32_t c = *src++; + if (srcsize != (grub_size_t)-1) + srcsize--; + if (count) + { + if ((c & GRUB_UINT8_2_LEADINGBITS) != GRUB_UINT8_1_LEADINGBIT) + { + /* invalid */ + return -1; + } + else + { + code <<= 6; + code |= (c & GRUB_UINT8_6_TRAILINGBITS); + count--; + } + } + else + { + if (c == 0) + break; + + if ((c & GRUB_UINT8_1_LEADINGBIT) == 0) + code = c; + else if ((c & GRUB_UINT8_3_LEADINGBITS) == GRUB_UINT8_2_LEADINGBITS) + { + count = 1; + code = c & GRUB_UINT8_5_TRAILINGBITS; + } + else if ((c & GRUB_UINT8_4_LEADINGBITS) == GRUB_UINT8_3_LEADINGBITS) + { + count = 2; + code = c & GRUB_UINT8_4_TRAILINGBITS; + } + else if ((c & GRUB_UINT8_5_LEADINGBITS) == GRUB_UINT8_4_LEADINGBITS) + { + count = 3; + code = c & GRUB_UINT8_3_TRAILINGBITS; + } + else if ((c & GRUB_UINT8_6_LEADINGBITS) == GRUB_UINT8_5_LEADINGBITS) + { + count = 4; + code = c & GRUB_UINT8_2_TRAILINGBITS; + } + else if ((c & GRUB_UINT8_7_LEADINGBITS) == GRUB_UINT8_6_LEADINGBITS) + { + count = 5; + code = c & GRUB_UINT8_1_TRAILINGBIT; + } + else + return -1; + } + + if (count == 0) + { + if (destsize < 2 && code >= GRUB_UCS2_LIMIT) + break; + if (code >= GRUB_UCS2_LIMIT) + { + *p++ = GRUB_UTF16_UPPER_SURROGATE (code); + *p++ = GRUB_UTF16_LOWER_SURROGATE (code); + destsize -= 2; + } + else + { + *p++ = code; + destsize--; + } + } + } + + if (srcend) + *srcend = src; + return p - dest; +} ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] UTF-8 to UTF-16 transformation 2009-08-27 21:31 ` Vladimir 'phcoder' Serbinenko 2009-08-27 22:11 ` Installing Solaris without a CDROM Seth Goldberg 2009-08-28 13:21 ` [PATCH] UTF-8 to UTF-16 transformation Vladimir 'phcoder' Serbinenko @ 2009-08-28 16:21 ` Robert Millan 2009-08-28 17:39 ` Vladimir 'phcoder' Serbinenko 2 siblings, 1 reply; 9+ messages in thread From: Robert Millan @ 2009-08-28 16:21 UTC (permalink / raw) To: The development of GRUB 2 On Thu, Aug 27, 2009 at 11:31:28PM +0200, Vladimir 'phcoder' Serbinenko wrote: > On Wed, Aug 26, 2009 at 2:31 AM, Robert Millan<rmh@aybabtu.com> wrote: > > On Mon, Aug 24, 2009 at 09:23:22PM +0200, Vladimir 'phcoder' Serbinenko wrote: > > > >> 2009-08-24 Vladimir Serbinenko <phcoder@gmail.com> > >> > >> UTF-8 to UTF-16 transformation. > >> > >> * conf/common.rmk (pkglib_MODULES): Add utf.mod > >> (utf_mod_SOURCES): New variable. > >> (utf_mod_CFLAGS): Likewise. > >> (utf_mod_LDFLAGS): Likewise. > >> * include/grub/utf.h: New file. > >> * lib/utf.c: New file. (Based on grub_utf8_to_ucs4 from kern/misc.c) > > > > Sounds like we could end up needing more of this (to other charsets), so > > why not give this module a generic name to hint as to where it can be added? > > > I'm ok with renaming but whether a conversion goes to charset.mod is > perhaps to be decided on case-by-case basis- > > The conversion functions in kern/misc.c could eventually move there as well, > > once UTF-8 support becomes optional in the kernel. > utf16_to_utf8 can be moved now out of the kernel but it's used by some > fs modules (e.g. fat). Perhaps utf16_to_utf8 should be a separate > module? This would decrease the size of biggest cores with the price > of its increase in smaller cores. Uhm perhaps we should use inline functions then. What do you think? > >> + if ((c & 0x80) == 0x00) > >> + code = c; > >> + else if ((c & 0xe0) == 0xc0) > > > > These should be macroified. > > > Actually this are accelerated bitchecks (bit numbers follow specific > and easy pattern) and for real readability would have to be written in > binary but AFAIK binary notation isn't supported in C code and would > result in overly long strings Ah, right. Then it's not a problem. Maybe with (1 << 7) instead of 0x80, if you prefer. -- Robert Millan The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and how) you may access your data; but nobody's threatening your freedom: we still allow you to remove your data and not access it at all." ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] UTF-8 to UTF-16 transformation 2009-08-28 16:21 ` Robert Millan @ 2009-08-28 17:39 ` Vladimir 'phcoder' Serbinenko 0 siblings, 0 replies; 9+ messages in thread From: Vladimir 'phcoder' Serbinenko @ 2009-08-28 17:39 UTC (permalink / raw) To: The development of GRUB 2 >> I'm ok with renaming but whether a conversion goes to charset.mod is >> perhaps to be decided on case-by-case basis- >> > The conversion functions in kern/misc.c could eventually move there as well, >> > once UTF-8 support becomes optional in the kernel. >> utf16_to_utf8 can be moved now out of the kernel but it's used by some >> fs modules (e.g. fat). Perhaps utf16_to_utf8 should be a separate >> module? This would decrease the size of biggest cores with the price >> of its increase in smaller cores. > > Uhm perhaps we should use inline functions then. What do you think? For me it's ok. During my tests with misc.c I checked this one too and on 3 filesystems using utf16_to_utf I had following results: core pc+fat+biosdisk: 7 bytes decrease core pc+hfsplus+biosdisk: 2 bytes increase. core pc+ntfs+biosdisk: 33 bytes decrease I haven't checked the size with filesystems not using utf16_to_utf8. Unfortunately USB uses this function too. I'll look if it's essential and inlining effects on core. > >> >> + if ((c & 0x80) == 0x00) >> >> + code = c; >> >> + else if ((c & 0xe0) == 0xc0) >> > >> > These should be macroified. >> > >> Actually this are accelerated bitchecks (bit numbers follow specific >> and easy pattern) and for real readability would have to be written in >> binary but AFAIK binary notation isn't supported in C code and would >> result in overly long strings > > Ah, right. Then it's not a problem. Maybe with (1 << 7) instead of 0x80, > if you prefer. > In second version I macroified these values (as GRUB_UINT8_*_(LEADING|TRAILING)_BIT[S] ) > -- > Robert Millan > > The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and > how) you may access your data; but nobody's threatening your freedom: we > still allow you to remove your data and not access it at all." > > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > http://lists.gnu.org/mailman/listinfo/grub-devel > -- Regards Vladimir 'phcoder' Serbinenko Personal git repository: http://repo.or.cz/w/grub2/phcoder.git ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-08-28 17:39 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-24 19:22 [PATCH] UTF-8 to UTF-16 transformation Vladimir 'phcoder' Serbinenko 2009-08-24 19:23 ` Vladimir 'phcoder' Serbinenko 2009-08-26 0:31 ` Robert Millan 2009-08-26 23:27 ` Joe Auricchio 2009-08-27 21:31 ` Vladimir 'phcoder' Serbinenko 2009-08-27 22:11 ` Installing Solaris without a CDROM Seth Goldberg 2009-08-28 13:21 ` [PATCH] UTF-8 to UTF-16 transformation Vladimir 'phcoder' Serbinenko 2009-08-28 16:21 ` Robert Millan 2009-08-28 17:39 ` Vladimir 'phcoder' Serbinenko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.