From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751491AbcBLXNo (ORCPT ); Fri, 12 Feb 2016 18:13:44 -0500 Received: from mail-qg0-f45.google.com ([209.85.192.45]:36083 "EHLO mail-qg0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750852AbcBLXNn (ORCPT ); Fri, 12 Feb 2016 18:13:43 -0500 From: Jason Andryuk To: Peter Jones , Matt Fleming , linux-kernel@vger.kernel.org Cc: Matthew Garrett , Jason Andryuk Subject: [PATCH] lib/ucs2_string: Correct ucs2 -> utf8 conversion Date: Fri, 12 Feb 2016 23:13:33 +0000 Message-Id: <1455318813-30667-1-git-send-email-jandryuk@gmail.com> X-Mailer: git-send-email 2.4.3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The comparisons should be >= since 0x800 and 0x80 require an additional bit to store. For the 3 byte case, the existing shift would drop off 2 more bits than intended. For the 2 byte case, there should be 5 bits bits in byte 1, and 6 bits in byte 2. Signed-off-by: Jason Andryuk --- Tested in user space, but not in the kernel. Conversions now match python's unicode conversions. lib/ucs2_string.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/lib/ucs2_string.c b/lib/ucs2_string.c index 17dd74e..f0b323a 100644 --- a/lib/ucs2_string.c +++ b/lib/ucs2_string.c @@ -59,9 +59,9 @@ ucs2_utf8size(const ucs2_char_t *src) for (i = 0; i < ucs2_strlen(src); i++) { u16 c = src[i]; - if (c > 0x800) + if (c >= 0x800) j += 3; - else if (c > 0x80) + else if (c >= 0x80) j += 2; else j += 1; @@ -88,19 +88,19 @@ ucs2_as_utf8(u8 *dest, const ucs2_char_t *src, unsigned long maxlength) for (i = 0; maxlength && i < limit; i++) { u16 c = src[i]; - if (c > 0x800) { + if (c >= 0x800) { if (maxlength < 3) break; maxlength -= 3; dest[j++] = 0xe0 | (c & 0xf000) >> 12; - dest[j++] = 0x80 | (c & 0x0fc0) >> 8; + dest[j++] = 0x80 | (c & 0x0fc0) >> 6; dest[j++] = 0x80 | (c & 0x003f); - } else if (c > 0x80) { + } else if (c >= 0x80) { if (maxlength < 2) break; maxlength -= 2; - dest[j++] = 0xc0 | (c & 0xfe0) >> 5; - dest[j++] = 0x80 | (c & 0x01f); + dest[j++] = 0xc0 | (c & 0x7c0) >> 6; + dest[j++] = 0x80 | (c & 0x03f); } else { maxlength -= 1; dest[j++] = c & 0x7f; -- 2.4.3