All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Add USC-2 to UTF-8 conversion utility function.
@ 2009-07-05  3:13 Andrzej Zaborowski
  2009-07-06  6:51 ` Aki Niemi
  0 siblings, 1 reply; 3+ messages in thread
From: Andrzej Zaborowski @ 2009-07-05  3:13 UTC (permalink / raw)
  To: ofono

[-- Attachment #1: Type: text/plain, Size: 2847 bytes --]

---
 src/util.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/util.h |    4 ++++
 2 files changed, 63 insertions(+), 0 deletions(-)

diff --git a/src/util.c b/src/util.c
index 84ce507..7a96afe 100644
--- a/src/util.c
+++ b/src/util.c
@@ -26,6 +26,7 @@
 #include <stdio.h>
 #include <string.h>
 #include <ctype.h>
+#include <endian.h>
 
 #include <glib.h>
 
@@ -692,3 +693,61 @@ unsigned char *pack_7bit(const unsigned char *in, long len, int byte_offset,
 	return pack_7bit_own_buf(in, len, byte_offset, ussd, items_written,
 					terminator, buf);
 }
+
+/*!
+ * Converts text coded using ISO/IEC 10646 UCS-2 encoding into UTF8 encoded
+ * text.  Input buffer length is given in bytes, not words.
+ *
+ * Returns newly-allocated UTF8 encoded string or NULL if the conversion
+ * could not be performed.  Returns the number of bytes read from the
+ * UCS-2 encoded string in items_read (if not NULL), not including the
+ * terminator character.  Returns the number of bytes written into the UTF8
+ * encoded string in items_written (if not NULL) not including the terminal
+ * '\0' character.  The caller is reponsible for freeing the returned value.
+ */
+char *convert_ucs2_to_utf8(const unsigned char *buffer, long len,
+				long *items_read, long *items_written,
+				unsigned short terminator)
+{
+	int i;
+	unsigned short ucs2;
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+	unsigned char *swap_buf;
+#endif
+	unsigned char *ret;
+	GError *error;
+
+	/* All UCS-2 text is valid UTF-16 text but UTF-16 sequences of
+	 * surrogate pairs are not valid in UCS-2 so first check that
+	 * there are no surrogate pairs in the buffer and then use
+	 * g_utf16_to_utf8() on it.  */
+	for (i = 0; i < len - 1; i += 2) {
+		ucs2 = (buffer[i] << 8) + buffer[i + 1];
+
+		if (ucs2 == terminator)
+			break;
+
+		if ((ucs2 & 0xf800) == 0xd800)
+			return NULL;
+	}
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+	swap_buf = g_malloc(i);
+	if (!swap_buf)
+		return NULL;
+
+	swab(buffer, swap_buf, i);
+	buffer = swap_buf;
+#endif
+
+	ret = g_utf16_to_utf8((const gunichar2 *) buffer, i / 2,
+			NULL, items_written, &error);
+	if (ret && items_read)
+		*items_read = i;
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+	g_free(swap_buf);
+#endif
+
+	return ret;
+}
diff --git a/src/util.h b/src/util.h
index 6269630..8589f0c 100644
--- a/src/util.h
+++ b/src/util.h
@@ -56,3 +56,7 @@ unsigned char *pack_7bit_own_buf(const unsigned char *in, long len,
 unsigned char *pack_7bit(const unsigned char *in, long len, int byte_offset,
 				gboolean ussd,
 				long *items_written, unsigned char terminator);
+
+char *convert_ucs2_to_utf8(const unsigned char *buffer, long len,
+				long *items_read, long *items_written,
+				unsigned short terminator);
-- 
1.6.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] Add USC-2 to UTF-8 conversion utility function.
  2009-07-05  3:13 [PATCH] Add USC-2 to UTF-8 conversion utility function Andrzej Zaborowski
@ 2009-07-06  6:51 ` Aki Niemi
  0 siblings, 0 replies; 3+ messages in thread
From: Aki Niemi @ 2009-07-06  6:51 UTC (permalink / raw)
  To: ofono

[-- Attachment #1: Type: text/plain, Size: 262 bytes --]


On Sun,  5 Jul 2009 05:13:45 +0200, Andrzej Zaborowski
<andrew.zaborowski@intel.com> wrote:
> +
> +	ret = g_utf16_to_utf8((const gunichar2 *) buffer, i / 2,
> +			NULL, items_written, &error);

Why not use iconv() with UCS-2BE to UTF-8?

Cheers,
Aki

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Add USC-2 to UTF-8 conversion utility function.
@ 2009-07-06 16:26 Denis Kenzior
  0 siblings, 0 replies; 3+ messages in thread
From: Denis Kenzior @ 2009-07-06 16:26 UTC (permalink / raw)
  To: ofono

[-- Attachment #1: Type: text/plain, Size: 240 bytes --]

Hi,

The rest of the codebase uses g_convert which handles the necessary magic of 
invoking iconv.  So this function is really not necessary, use g_convert(buf, 
bufsize, "UTF-8//TRANSLIT", "UCS-2BE",... instead.

Regards,
-Denis


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-07-06 16:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-05  3:13 [PATCH] Add USC-2 to UTF-8 conversion utility function Andrzej Zaborowski
2009-07-06  6:51 ` Aki Niemi
  -- strict thread matches above, loose matches on Subject: below --
2009-07-06 16:26 Denis Kenzior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.