All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] UTF-8 to UTF-16 transformation
@ 2009-08-24 19:22 Vladimir 'phcoder' Serbinenko
  2009-08-24 19:23 ` Vladimir 'phcoder' Serbinenko
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir 'phcoder' Serbinenko @ 2009-08-24 19:22 UTC (permalink / raw)
  To: The development of GRUB 2

Splitted from my newtree patch

-- 
Regards
Vladimir 'phcoder' Serbinenko

Personal git repository: http://repo.or.cz/w/grub2/phcoder.git



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] UTF-8 to UTF-16 transformation
  2009-08-24 19:22 [PATCH] UTF-8 to UTF-16 transformation Vladimir 'phcoder' Serbinenko
@ 2009-08-24 19:23 ` Vladimir 'phcoder' Serbinenko
  2009-08-26  0:31   ` Robert Millan
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir 'phcoder' Serbinenko @ 2009-08-24 19:23 UTC (permalink / raw)
  To: The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 349 bytes --]

On Mon, Aug 24, 2009 at 9:22 PM, Vladimir 'phcoder'
Serbinenko<phcoder@gmail.com> wrote:
> Splitted from my newtree patch
>
> --
> Regards
> Vladimir 'phcoder' Serbinenko
>
> Personal git repository: http://repo.or.cz/w/grub2/phcoder.git
>



-- 
Regards
Vladimir 'phcoder' Serbinenko

Personal git repository: http://repo.or.cz/w/grub2/phcoder.git

[-- Attachment #2: utf.diff --]
[-- Type: text/plain, Size: 4878 bytes --]

2009-08-24  Vladimir Serbinenko  <phcoder@gmail.com>

	UTF-8 to UTF-16 transformation.

	* conf/common.rmk (pkglib_MODULES): Add utf.mod
	(utf_mod_SOURCES): New variable.
	(utf_mod_CFLAGS): Likewise.
	(utf_mod_LDFLAGS): Likewise.
	* include/grub/utf.h: New file.
	* lib/utf.c: New file. (Based on grub_utf8_to_ucs4 from kern/misc.c)

diff --git a/conf/common.rmk b/conf/common.rmk
index b0d3785..b5a6048 100644
--- a/conf/common.rmk
+++ b/conf/common.rmk
@@ -617,3 +617,8 @@ pkglib_MODULES += setjmp.mod
 setjmp_mod_SOURCES = lib/$(target_cpu)/setjmp.S
 setjmp_mod_ASFLAGS = $(COMMON_ASFLAGS)
 setjmp_mod_LDFLAGS = $(COMMON_LDFLAGS)
+
+pkglib_MODULES += utf.mod
+utf_mod_SOURCES = lib/utf.c
+utf_mod_CFLAGS = $(COMMON_CFLAGS)
+utf_mod_LDFLAGS = $(COMMON_LDFLAGS)
diff --git a/include/grub/utf.h b/include/grub/utf.h
new file mode 100644
index 0000000..2091916
--- /dev/null
+++ b/include/grub/utf.h
@@ -0,0 +1,29 @@
+/*
+ *  GRUB  --  GRand Unified Bootloader
+ *  Copyright (C) 2009  Free Software Foundation, Inc.
+ *
+ *  GRUB is free software: you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation, either version 3 of the License, or
+ *  (at your option) any later version.
+ *
+ *  GRUB is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with GRUB.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef GRUB_UTF_HEADER
+#define GRUB_UTF_HEADER	1
+
+#include <grub/types.h>
+
+grub_ssize_t
+grub_utf8_to_utf16 (grub_uint16_t *dest, grub_size_t destsize,
+		    const grub_uint8_t *src, grub_size_t srcsize,
+		    const grub_uint8_t **srcend);
+
+#endif
diff --git a/lib/utf.c b/lib/utf.c
new file mode 100644
index 0000000..1f89f2f
--- /dev/null
+++ b/lib/utf.c
@@ -0,0 +1,116 @@
+/*
+ *  GRUB  --  GRand Unified Bootloader
+ *  Copyright (C) 1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009  Free Software Foundation, Inc.
+ *
+ *  GRUB is free software: you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation, either version 3 of the License, or
+ *  (at your option) any later version.
+ *
+ *  GRUB is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with GRUB.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+/* Convert a (possibly null-terminated) UTF-8 string of at most SRCSIZE
+   bytes (if SRCSIZE is -1, it is ignored) in length to a UTF-16 string.
+   Return the number of characters converted. DEST must be able to hold
+   at least DESTSIZE characters. If an invalid sequence is found, return -1.
+   If SRCEND is not NULL, then *SRCEND is set to the next byte after the
+   last byte used in SRC.  */
+
+#include <grub/utf.h>
+
+grub_ssize_t
+grub_utf8_to_utf16 (grub_uint16_t *dest, grub_size_t destsize,
+		    const grub_uint8_t *src, grub_size_t srcsize,
+		    const grub_uint8_t **srcend)
+{
+  grub_uint16_t *p = dest;
+  int count = 0;
+  grub_uint32_t code = 0;
+
+  if (srcend)
+    *srcend = src;
+
+  while (srcsize && destsize)
+    {
+      grub_uint32_t c = *src++;
+      if (srcsize != (grub_size_t)-1)
+	srcsize--;
+      if (count)
+	{
+	  if ((c & 0xc0) != 0x80)
+	    {
+	      /* invalid */
+	      return -1;
+	    }
+	  else
+	    {
+	      code <<= 6;
+	      code |= (c & 0x3f);
+	      count--;
+	    }
+	}
+      else
+	{
+	  if (c == 0)
+	    break;
+
+	  if ((c & 0x80) == 0x00)
+	    code = c;
+	  else if ((c & 0xe0) == 0xc0)
+	    {
+	      count = 1;
+	      code = c & 0x1f;
+	    }
+	  else if ((c & 0xf0) == 0xe0)
+	    {
+	      count = 2;
+	      code = c & 0x0f;
+	    }
+	  else if ((c & 0xf8) == 0xf0)
+	    {
+	      count = 3;
+	      code = c & 0x07;
+	    }
+	  else if ((c & 0xfc) == 0xf8)
+	    {
+	      count = 4;
+	      code = c & 0x03;
+	    }
+	  else if ((c & 0xfe) == 0xfc)
+	    {
+	      count = 5;
+	      code = c & 0x01;
+	    }
+	  else
+	    return -1;
+	}
+
+      if (count == 0)
+	{
+	  if (destsize < 2 && code > 0x10000)
+	    break;
+	  if (code > 0x10000)
+	    {
+	      *p++ = 0xD800 + (((code - 0x10000) >> 12) & 0xfff);
+	      *p++ = 0xDC00 + ((code - 0x10000) & 0xfff);
+	      destsize -= 2;
+	    }
+	  else
+	    {
+	      *p++ = code;
+	      destsize--;
+	    }
+	}
+    }
+
+  if (srcend)
+    *srcend = src;
+  return p - dest;
+}

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] UTF-8 to UTF-16 transformation
  2009-08-24 19:23 ` Vladimir 'phcoder' Serbinenko
@ 2009-08-26  0:31   ` Robert Millan
  2009-08-26 23:27     ` Joe Auricchio
  2009-08-27 21:31     ` Vladimir 'phcoder' Serbinenko
  0 siblings, 2 replies; 9+ messages in thread
From: Robert Millan @ 2009-08-26  0:31 UTC (permalink / raw)
  To: The development of GRUB 2

On Mon, Aug 24, 2009 at 09:23:22PM +0200, Vladimir 'phcoder' Serbinenko wrote:

> 2009-08-24  Vladimir Serbinenko  <phcoder@gmail.com>
> 
> 	UTF-8 to UTF-16 transformation.
> 
> 	* conf/common.rmk (pkglib_MODULES): Add utf.mod
> 	(utf_mod_SOURCES): New variable.
> 	(utf_mod_CFLAGS): Likewise.
> 	(utf_mod_LDFLAGS): Likewise.
> 	* include/grub/utf.h: New file.
> 	* lib/utf.c: New file. (Based on grub_utf8_to_ucs4 from kern/misc.c)

Sounds like we could end up needing more of this (to other charsets), so
why not give this module a generic name to hint as to where it can be added?

The conversion functions in kern/misc.c could eventually move there as well,
once UTF-8 support becomes optional in the kernel.

GNU libc has "iconv" command and "iconv_*" facilities for charset conversion,
how about iconv.mod for consistency?

> +	  if ((c & 0x80) == 0x00)
> +	    code = c;
> +	  else if ((c & 0xe0) == 0xc0)

These should be macroified.

-- 
Robert Millan

  The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and
  how) you may access your data; but nobody's threatening your freedom: we
  still allow you to remove your data and not access it at all."



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] UTF-8 to UTF-16 transformation
  2009-08-26  0:31   ` Robert Millan
@ 2009-08-26 23:27     ` Joe Auricchio
  2009-08-27 21:31     ` Vladimir 'phcoder' Serbinenko
  1 sibling, 0 replies; 9+ messages in thread
From: Joe Auricchio @ 2009-08-26 23:27 UTC (permalink / raw)
  To: The development of GRUB 2

On Tue, Aug 25, 2009 at 17:31, Robert Millan<rmh@aybabtu.com> wrote:

> GNU libc has "iconv" command and "iconv_*" facilities for charset conversion,
> how about iconv.mod for consistency?

My 2 cents is: that might be confusing. At least, if I saw iconv.mod,
I would assume it was really GNU libc's iconv library.

char_conv.mod?
charset_conv.mod?
charset.mod?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] UTF-8 to UTF-16 transformation
  2009-08-26  0:31   ` Robert Millan
  2009-08-26 23:27     ` Joe Auricchio
@ 2009-08-27 21:31     ` Vladimir 'phcoder' Serbinenko
  2009-08-27 22:11       ` Installing Solaris without a CDROM Seth Goldberg
                         ` (2 more replies)
  1 sibling, 3 replies; 9+ messages in thread
From: Vladimir 'phcoder' Serbinenko @ 2009-08-27 21:31 UTC (permalink / raw)
  To: The development of GRUB 2

On Wed, Aug 26, 2009 at 2:31 AM, Robert Millan<rmh@aybabtu.com> wrote:
> On Mon, Aug 24, 2009 at 09:23:22PM +0200, Vladimir 'phcoder' Serbinenko wrote:
>
>> 2009-08-24  Vladimir Serbinenko  <phcoder@gmail.com>
>>
>>       UTF-8 to UTF-16 transformation.
>>
>>       * conf/common.rmk (pkglib_MODULES): Add utf.mod
>>       (utf_mod_SOURCES): New variable.
>>       (utf_mod_CFLAGS): Likewise.
>>       (utf_mod_LDFLAGS): Likewise.
>>       * include/grub/utf.h: New file.
>>       * lib/utf.c: New file. (Based on grub_utf8_to_ucs4 from kern/misc.c)
>
> Sounds like we could end up needing more of this (to other charsets), so
> why not give this module a generic name to hint as to where it can be added?
>
I'm ok with renaming but whether a conversion goes to charset.mod is
perhaps to be decided on case-by-case basis-
> The conversion functions in kern/misc.c could eventually move there as well,
> once UTF-8 support becomes optional in the kernel.
utf16_to_utf8 can be moved now out of the kernel but it's used by some
fs modules (e.g. fat). Perhaps utf16_to_utf8 should be a separate
module? This would decrease the size of biggest cores with the price
of its increase in smaller cores.
>
> GNU libc has "iconv" command and "iconv_*" facilities for charset conversion,
> how about iconv.mod for consistency?
>
>> +       if ((c & 0x80) == 0x00)
>> +         code = c;
>> +       else if ((c & 0xe0) == 0xc0)
>
> These should be macroified.
>
Actually this are accelerated bitchecks (bit numbers follow specific
and easy pattern) and for real readability would have to be written in
binary but AFAIK binary notation isn't supported in C code and would
result in overly long strings
> --
> Robert Millan
>
>  The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and
>  how) you may access your data; but nobody's threatening your freedom: we
>  still allow you to remove your data and not access it at all."
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> http://lists.gnu.org/mailman/listinfo/grub-devel
>



-- 
Regards
Vladimir 'phcoder' Serbinenko

Personal git repository: http://repo.or.cz/w/grub2/phcoder.git



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Installing Solaris without a CDROM
  2009-08-27 21:31     ` Vladimir 'phcoder' Serbinenko
@ 2009-08-27 22:11       ` Seth Goldberg
  2009-08-28 13:21       ` [PATCH] UTF-8 to UTF-16 transformation Vladimir 'phcoder' Serbinenko
  2009-08-28 16:21       ` Robert Millan
  2 siblings, 0 replies; 9+ messages in thread
From: Seth Goldberg @ 2009-08-27 22:11 UTC (permalink / raw)
  To: The development of GRUB 2

Hi,

  You asked about it -- yes, there is.  You can install it via a USB storage 
device (if your BIOS can boot from it).

Take a look at this:
http://opensolaris.org/jive/thread.jspa?threadID=107723&tstart=45


  --S




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] UTF-8 to UTF-16 transformation
  2009-08-27 21:31     ` Vladimir 'phcoder' Serbinenko
  2009-08-27 22:11       ` Installing Solaris without a CDROM Seth Goldberg
@ 2009-08-28 13:21       ` Vladimir 'phcoder' Serbinenko
  2009-08-28 16:21       ` Robert Millan
  2 siblings, 0 replies; 9+ messages in thread
From: Vladimir 'phcoder' Serbinenko @ 2009-08-28 13:21 UTC (permalink / raw)
  To: The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 2540 bytes --]

On Thu, Aug 27, 2009 at 11:31 PM, Vladimir 'phcoder'
Serbinenko<phcoder@gmail.com> wrote:
> On Wed, Aug 26, 2009 at 2:31 AM, Robert Millan<rmh@aybabtu.com> wrote:
>> On Mon, Aug 24, 2009 at 09:23:22PM +0200, Vladimir 'phcoder' Serbinenko wrote:
>>
>>> 2009-08-24  Vladimir Serbinenko  <phcoder@gmail.com>
>>>
>>>       UTF-8 to UTF-16 transformation.
>>>
>>>       * conf/common.rmk (pkglib_MODULES): Add utf.mod
>>>       (utf_mod_SOURCES): New variable.
>>>       (utf_mod_CFLAGS): Likewise.
>>>       (utf_mod_LDFLAGS): Likewise.
>>>       * include/grub/utf.h: New file.
>>>       * lib/utf.c: New file. (Based on grub_utf8_to_ucs4 from kern/misc.c)
>>
>> Sounds like we could end up needing more of this (to other charsets), so
>> why not give this module a generic name to hint as to where it can be added?
>>
> I'm ok with renaming but whether a conversion goes to charset.mod is
> perhaps to be decided on case-by-case basis-
>> The conversion functions in kern/misc.c could eventually move there as well,
>> once UTF-8 support becomes optional in the kernel.
> utf16_to_utf8 can be moved now out of the kernel but it's used by some
> fs modules (e.g. fat). Perhaps utf16_to_utf8 should be a separate
> module? This would decrease the size of biggest cores with the price
> of its increase in smaller cores.
>>
>> GNU libc has "iconv" command and "iconv_*" facilities for charset conversion,
>> how about iconv.mod for consistency?
>>
>>> +       if ((c & 0x80) == 0x00)
>>> +         code = c;
>>> +       else if ((c & 0xe0) == 0xc0)
>>
>> These should be macroified.
>>
> Actually this are accelerated bitchecks (bit numbers follow specific
> and easy pattern) and for real readability would have to be written in
> binary but AFAIK binary notation isn't supported in C code and would
> result in overly long strings
>> --
>> Robert Millan
>>
>>  The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and
>>  how) you may access your data; but nobody's threatening your freedom: we
>>  still allow you to remove your data and not access it at all."
>>
>>
>> _______________________________________________
>> Grub-devel mailing list
>> Grub-devel@gnu.org
>> http://lists.gnu.org/mailman/listinfo/grub-devel
>>
>
>
>
> --
> Regards
> Vladimir 'phcoder' Serbinenko
>
> Personal git repository: http://repo.or.cz/w/grub2/phcoder.git
>



-- 
Regards
Vladimir 'phcoder' Serbinenko

Personal git repository: http://repo.or.cz/w/grub2/phcoder.git

[-- Attachment #2: utf.diff --]
[-- Type: text/plain, Size: 6422 bytes --]

diff --git a/ChangeLog b/ChangeLog
index ab542e2..367ab05 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -158,6 +158,17 @@
 
 2009-08-24  Vladimir Serbinenko  <phcoder@gmail.com>
 
+	UTF-8 to UTF-16 transformation.
+
+	* conf/common.rmk (pkglib_MODULES): Add utf.mod
+	(utf_mod_SOURCES): New variable.
+	(utf_mod_CFLAGS): Likewise.
+	(utf_mod_LDFLAGS): Likewise.
+	* include/grub/utf.h: New file.
+	* lib/utf.c: New file. (Based on grub_utf8_to_ucs4 from kern/misc.c)
+
+2009-08-24  Vladimir Serbinenko  <phcoder@gmail.com>
+
 	* script/sh/function.c (grub_script_function_find): Cut error message
 	not to flood terminal.
 	* script/sh/lexer.c (grub_script_yylex): Remove command line length
diff --git a/conf/common.rmk b/conf/common.rmk
index 7727f19..735e57a 100644
--- a/conf/common.rmk
+++ b/conf/common.rmk
@@ -633,3 +633,8 @@ pkglib_MODULES += setjmp.mod
 setjmp_mod_SOURCES = lib/$(target_cpu)/setjmp.S
 setjmp_mod_ASFLAGS = $(COMMON_ASFLAGS)
 setjmp_mod_LDFLAGS = $(COMMON_LDFLAGS)
+
+pkglib_MODULES += charset.mod
+charset_mod_SOURCES = lib/charset.c
+charset_mod_CFLAGS = $(COMMON_CFLAGS)
+charset_mod_LDFLAGS = $(COMMON_LDFLAGS)
diff --git a/include/grub/charset.h b/include/grub/charset.h
new file mode 100644
index 0000000..22b6724
--- /dev/null
+++ b/include/grub/charset.h
@@ -0,0 +1,50 @@
+/*
+ *  GRUB  --  GRand Unified Bootloader
+ *  Copyright (C) 2009  Free Software Foundation, Inc.
+ *
+ *  GRUB is free software: you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation, either version 3 of the License, or
+ *  (at your option) any later version.
+ *
+ *  GRUB is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with GRUB.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef GRUB_CHARSET_HEADER
+#define GRUB_CHARSET_HEADER	1
+
+#include <grub/types.h>
+
+#define GRUB_UINT8_1_LEADINGBIT 0x80
+#define GRUB_UINT8_2_LEADINGBITS 0xc0
+#define GRUB_UINT8_3_LEADINGBITS 0xe0
+#define GRUB_UINT8_4_LEADINGBITS 0xf0
+#define GRUB_UINT8_5_LEADINGBITS 0xf8
+#define GRUB_UINT8_6_LEADINGBITS 0xfc
+#define GRUB_UINT8_7_LEADINGBITS 0xfe
+
+#define GRUB_UINT8_1_TRAILINGBIT 0x01
+#define GRUB_UINT8_2_TRAILINGBITS 0x03
+#define GRUB_UINT8_3_TRAILINGBITS 0x07
+#define GRUB_UINT8_4_TRAILINGBITS 0x0f
+#define GRUB_UINT8_5_TRAILINGBITS 0x1f
+#define GRUB_UINT8_6_TRAILINGBITS 0x3f
+
+#define GRUB_UCS2_LIMIT 0x10000
+#define GRUB_UTF16_UPPER_SURROGATE(code) \
+  (0xD800 + ((((code) - GRUB_UCS2_LIMIT) >> 12) & 0xfff))
+#define GRUB_UTF16_LOWER_SURROGATE(code) \
+  (0xDC00 + (((code) - GRUB_UCS2_LIMIT) & 0xfff))
+
+grub_ssize_t
+grub_utf8_to_utf16 (grub_uint16_t *dest, grub_size_t destsize,
+		    const grub_uint8_t *src, grub_size_t srcsize,
+		    const grub_uint8_t **srcend);
+
+#endif
diff --git a/lib/charset.c b/lib/charset.c
new file mode 100644
index 0000000..8bc5b91
--- /dev/null
+++ b/lib/charset.c
@@ -0,0 +1,116 @@
+/*
+ *  GRUB  --  GRand Unified Bootloader
+ *  Copyright (C) 1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009  Free Software Foundation, Inc.
+ *
+ *  GRUB is free software: you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation, either version 3 of the License, or
+ *  (at your option) any later version.
+ *
+ *  GRUB is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with GRUB.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+/* Convert a (possibly null-terminated) UTF-8 string of at most SRCSIZE
+   bytes (if SRCSIZE is -1, it is ignored) in length to a UTF-16 string.
+   Return the number of characters converted. DEST must be able to hold
+   at least DESTSIZE characters. If an invalid sequence is found, return -1.
+   If SRCEND is not NULL, then *SRCEND is set to the next byte after the
+   last byte used in SRC.  */
+
+#include <grub/charset.h>
+
+grub_ssize_t
+grub_utf8_to_utf16 (grub_uint16_t *dest, grub_size_t destsize,
+		    const grub_uint8_t *src, grub_size_t srcsize,
+		    const grub_uint8_t **srcend)
+{
+  grub_uint16_t *p = dest;
+  int count = 0;
+  grub_uint32_t code = 0;
+
+  if (srcend)
+    *srcend = src;
+
+  while (srcsize && destsize)
+    {
+      grub_uint32_t c = *src++;
+      if (srcsize != (grub_size_t)-1)
+	srcsize--;
+      if (count)
+	{
+	  if ((c & GRUB_UINT8_2_LEADINGBITS) != GRUB_UINT8_1_LEADINGBIT)
+	    {
+	      /* invalid */
+	      return -1;
+	    }
+	  else
+	    {
+	      code <<= 6;
+	      code |= (c & GRUB_UINT8_6_TRAILINGBITS);
+	      count--;
+	    }
+	}
+      else
+	{
+	  if (c == 0)
+	    break;
+
+	  if ((c & GRUB_UINT8_1_LEADINGBIT) == 0)
+	    code = c;
+	  else if ((c & GRUB_UINT8_3_LEADINGBITS) == GRUB_UINT8_2_LEADINGBITS)
+	    {
+	      count = 1;
+	      code = c & GRUB_UINT8_5_TRAILINGBITS;
+	    }
+	  else if ((c & GRUB_UINT8_4_LEADINGBITS) == GRUB_UINT8_3_LEADINGBITS)
+	    {
+	      count = 2;
+	      code = c & GRUB_UINT8_4_TRAILINGBITS;
+	    }
+	  else if ((c & GRUB_UINT8_5_LEADINGBITS) == GRUB_UINT8_4_LEADINGBITS)
+	    {
+	      count = 3;
+	      code = c & GRUB_UINT8_3_TRAILINGBITS;
+	    }
+	  else if ((c & GRUB_UINT8_6_LEADINGBITS) == GRUB_UINT8_5_LEADINGBITS)
+	    {
+	      count = 4;
+	      code = c & GRUB_UINT8_2_TRAILINGBITS;
+	    }
+	  else if ((c & GRUB_UINT8_7_LEADINGBITS) == GRUB_UINT8_6_LEADINGBITS)
+	    {
+	      count = 5;
+	      code = c & GRUB_UINT8_1_TRAILINGBIT;
+	    }
+	  else
+	    return -1;
+	}
+
+      if (count == 0)
+	{
+	  if (destsize < 2 && code >= GRUB_UCS2_LIMIT)
+	    break;
+	  if (code >= GRUB_UCS2_LIMIT)
+	    {
+	      *p++ = GRUB_UTF16_UPPER_SURROGATE (code);
+	      *p++ = GRUB_UTF16_LOWER_SURROGATE (code);
+	      destsize -= 2;
+	    }
+	  else
+	    {
+	      *p++ = code;
+	      destsize--;
+	    }
+	}
+    }
+
+  if (srcend)
+    *srcend = src;
+  return p - dest;
+}

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] UTF-8 to UTF-16 transformation
  2009-08-27 21:31     ` Vladimir 'phcoder' Serbinenko
  2009-08-27 22:11       ` Installing Solaris without a CDROM Seth Goldberg
  2009-08-28 13:21       ` [PATCH] UTF-8 to UTF-16 transformation Vladimir 'phcoder' Serbinenko
@ 2009-08-28 16:21       ` Robert Millan
  2009-08-28 17:39         ` Vladimir 'phcoder' Serbinenko
  2 siblings, 1 reply; 9+ messages in thread
From: Robert Millan @ 2009-08-28 16:21 UTC (permalink / raw)
  To: The development of GRUB 2

On Thu, Aug 27, 2009 at 11:31:28PM +0200, Vladimir 'phcoder' Serbinenko wrote:
> On Wed, Aug 26, 2009 at 2:31 AM, Robert Millan<rmh@aybabtu.com> wrote:
> > On Mon, Aug 24, 2009 at 09:23:22PM +0200, Vladimir 'phcoder' Serbinenko wrote:
> >
> >> 2009-08-24  Vladimir Serbinenko  <phcoder@gmail.com>
> >>
> >>       UTF-8 to UTF-16 transformation.
> >>
> >>       * conf/common.rmk (pkglib_MODULES): Add utf.mod
> >>       (utf_mod_SOURCES): New variable.
> >>       (utf_mod_CFLAGS): Likewise.
> >>       (utf_mod_LDFLAGS): Likewise.
> >>       * include/grub/utf.h: New file.
> >>       * lib/utf.c: New file. (Based on grub_utf8_to_ucs4 from kern/misc.c)
> >
> > Sounds like we could end up needing more of this (to other charsets), so
> > why not give this module a generic name to hint as to where it can be added?
> >
> I'm ok with renaming but whether a conversion goes to charset.mod is
> perhaps to be decided on case-by-case basis-
> > The conversion functions in kern/misc.c could eventually move there as well,
> > once UTF-8 support becomes optional in the kernel.
> utf16_to_utf8 can be moved now out of the kernel but it's used by some
> fs modules (e.g. fat). Perhaps utf16_to_utf8 should be a separate
> module? This would decrease the size of biggest cores with the price
> of its increase in smaller cores.

Uhm perhaps we should use inline functions then.  What do you think?

> >> +       if ((c & 0x80) == 0x00)
> >> +         code = c;
> >> +       else if ((c & 0xe0) == 0xc0)
> >
> > These should be macroified.
> >
> Actually this are accelerated bitchecks (bit numbers follow specific
> and easy pattern) and for real readability would have to be written in
> binary but AFAIK binary notation isn't supported in C code and would
> result in overly long strings

Ah, right.  Then it's not a problem.  Maybe with (1 << 7) instead of 0x80,
if you prefer.

-- 
Robert Millan

  The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and
  how) you may access your data; but nobody's threatening your freedom: we
  still allow you to remove your data and not access it at all."



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] UTF-8 to UTF-16 transformation
  2009-08-28 16:21       ` Robert Millan
@ 2009-08-28 17:39         ` Vladimir 'phcoder' Serbinenko
  0 siblings, 0 replies; 9+ messages in thread
From: Vladimir 'phcoder' Serbinenko @ 2009-08-28 17:39 UTC (permalink / raw)
  To: The development of GRUB 2

>> I'm ok with renaming but whether a conversion goes to charset.mod is
>> perhaps to be decided on case-by-case basis-
>> > The conversion functions in kern/misc.c could eventually move there as well,
>> > once UTF-8 support becomes optional in the kernel.
>> utf16_to_utf8 can be moved now out of the kernel but it's used by some
>> fs modules (e.g. fat). Perhaps utf16_to_utf8 should be a separate
>> module? This would decrease the size of biggest cores with the price
>> of its increase in smaller cores.
>
> Uhm perhaps we should use inline functions then.  What do you think?
For me it's ok. During my tests with misc.c I checked this one too and
on 3 filesystems using utf16_to_utf I had following results:
core pc+fat+biosdisk: 7 bytes decrease
core pc+hfsplus+biosdisk: 2 bytes increase.
core pc+ntfs+biosdisk: 33 bytes decrease

I haven't checked the size with filesystems not using utf16_to_utf8.
Unfortunately USB uses this function too. I'll look if it's essential
and inlining effects on core.
>
>> >> +       if ((c & 0x80) == 0x00)
>> >> +         code = c;
>> >> +       else if ((c & 0xe0) == 0xc0)
>> >
>> > These should be macroified.
>> >
>> Actually this are accelerated bitchecks (bit numbers follow specific
>> and easy pattern) and for real readability would have to be written in
>> binary but AFAIK binary notation isn't supported in C code and would
>> result in overly long strings
>
> Ah, right.  Then it's not a problem.  Maybe with (1 << 7) instead of 0x80,
> if you prefer.
>
In second version I macroified these values (as
GRUB_UINT8_*_(LEADING|TRAILING)_BIT[S] )
> --
> Robert Millan
>
>  The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and
>  how) you may access your data; but nobody's threatening your freedom: we
>  still allow you to remove your data and not access it at all."
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> http://lists.gnu.org/mailman/listinfo/grub-devel
>



-- 
Regards
Vladimir 'phcoder' Serbinenko

Personal git repository: http://repo.or.cz/w/grub2/phcoder.git



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-08-28 17:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-24 19:22 [PATCH] UTF-8 to UTF-16 transformation Vladimir 'phcoder' Serbinenko
2009-08-24 19:23 ` Vladimir 'phcoder' Serbinenko
2009-08-26  0:31   ` Robert Millan
2009-08-26 23:27     ` Joe Auricchio
2009-08-27 21:31     ` Vladimir 'phcoder' Serbinenko
2009-08-27 22:11       ` Installing Solaris without a CDROM Seth Goldberg
2009-08-28 13:21       ` [PATCH] UTF-8 to UTF-16 transformation Vladimir 'phcoder' Serbinenko
2009-08-28 16:21       ` Robert Millan
2009-08-28 17:39         ` Vladimir 'phcoder' Serbinenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.