* [PATCH] Add USC-2 to UTF-8 conversion utility function.
@ 2009-07-05 3:13 Andrzej Zaborowski
2009-07-06 6:51 ` Aki Niemi
0 siblings, 1 reply; 3+ messages in thread
From: Andrzej Zaborowski @ 2009-07-05 3:13 UTC (permalink / raw)
To: ofono
[-- Attachment #1: Type: text/plain, Size: 2847 bytes --]
---
src/util.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
src/util.h | 4 ++++
2 files changed, 63 insertions(+), 0 deletions(-)
diff --git a/src/util.c b/src/util.c
index 84ce507..7a96afe 100644
--- a/src/util.c
+++ b/src/util.c
@@ -26,6 +26,7 @@
#include <stdio.h>
#include <string.h>
#include <ctype.h>
+#include <endian.h>
#include <glib.h>
@@ -692,3 +693,61 @@ unsigned char *pack_7bit(const unsigned char *in, long len, int byte_offset,
return pack_7bit_own_buf(in, len, byte_offset, ussd, items_written,
terminator, buf);
}
+
+/*!
+ * Converts text coded using ISO/IEC 10646 UCS-2 encoding into UTF8 encoded
+ * text. Input buffer length is given in bytes, not words.
+ *
+ * Returns newly-allocated UTF8 encoded string or NULL if the conversion
+ * could not be performed. Returns the number of bytes read from the
+ * UCS-2 encoded string in items_read (if not NULL), not including the
+ * terminator character. Returns the number of bytes written into the UTF8
+ * encoded string in items_written (if not NULL) not including the terminal
+ * '\0' character. The caller is reponsible for freeing the returned value.
+ */
+char *convert_ucs2_to_utf8(const unsigned char *buffer, long len,
+ long *items_read, long *items_written,
+ unsigned short terminator)
+{
+ int i;
+ unsigned short ucs2;
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+ unsigned char *swap_buf;
+#endif
+ unsigned char *ret;
+ GError *error;
+
+ /* All UCS-2 text is valid UTF-16 text but UTF-16 sequences of
+ * surrogate pairs are not valid in UCS-2 so first check that
+ * there are no surrogate pairs in the buffer and then use
+ * g_utf16_to_utf8() on it. */
+ for (i = 0; i < len - 1; i += 2) {
+ ucs2 = (buffer[i] << 8) + buffer[i + 1];
+
+ if (ucs2 == terminator)
+ break;
+
+ if ((ucs2 & 0xf800) == 0xd800)
+ return NULL;
+ }
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+ swap_buf = g_malloc(i);
+ if (!swap_buf)
+ return NULL;
+
+ swab(buffer, swap_buf, i);
+ buffer = swap_buf;
+#endif
+
+ ret = g_utf16_to_utf8((const gunichar2 *) buffer, i / 2,
+ NULL, items_written, &error);
+ if (ret && items_read)
+ *items_read = i;
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+ g_free(swap_buf);
+#endif
+
+ return ret;
+}
diff --git a/src/util.h b/src/util.h
index 6269630..8589f0c 100644
--- a/src/util.h
+++ b/src/util.h
@@ -56,3 +56,7 @@ unsigned char *pack_7bit_own_buf(const unsigned char *in, long len,
unsigned char *pack_7bit(const unsigned char *in, long len, int byte_offset,
gboolean ussd,
long *items_written, unsigned char terminator);
+
+char *convert_ucs2_to_utf8(const unsigned char *buffer, long len,
+ long *items_read, long *items_written,
+ unsigned short terminator);
--
1.6.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] Add USC-2 to UTF-8 conversion utility function.
2009-07-05 3:13 Andrzej Zaborowski
@ 2009-07-06 6:51 ` Aki Niemi
0 siblings, 0 replies; 3+ messages in thread
From: Aki Niemi @ 2009-07-06 6:51 UTC (permalink / raw)
To: ofono
[-- Attachment #1: Type: text/plain, Size: 262 bytes --]
On Sun, 5 Jul 2009 05:13:45 +0200, Andrzej Zaborowski
<andrew.zaborowski@intel.com> wrote:
> +
> + ret = g_utf16_to_utf8((const gunichar2 *) buffer, i / 2,
> + NULL, items_written, &error);
Why not use iconv() with UCS-2BE to UTF-8?
Cheers,
Aki
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] Add USC-2 to UTF-8 conversion utility function.
@ 2009-07-06 16:26 Denis Kenzior
0 siblings, 0 replies; 3+ messages in thread
From: Denis Kenzior @ 2009-07-06 16:26 UTC (permalink / raw)
To: ofono
[-- Attachment #1: Type: text/plain, Size: 240 bytes --]
Hi,
The rest of the codebase uses g_convert which handles the necessary magic of
invoking iconv. So this function is really not necessary, use g_convert(buf,
bufsize, "UTF-8//TRANSLIT", "UCS-2BE",... instead.
Regards,
-Denis
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-07-06 16:26 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-06 16:26 [PATCH] Add USC-2 to UTF-8 conversion utility function Denis Kenzior
-- strict thread matches above, loose matches on Subject: below --
2009-07-05 3:13 Andrzej Zaborowski
2009-07-06 6:51 ` Aki Niemi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.