All of lore.kernel.org
 help / color / mirror / Atom feed
From: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
To: Adam Borowski <kilobyte-b9QjgO8OEXPVItvQsEIGlw@public.gmane.org>
Cc: Roy Franz <roy.franz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	matt.fleming-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	leif.lindholm-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
	msalter-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Subject: Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.
Date: Wed, 18 Sep 2013 22:46:39 -0500	[thread overview]
Message-ID: <523A739F.9050103@zytor.com> (raw)
In-Reply-To: <20130919034406.GA26385-b9QjgO8OEXPVItvQsEIGlw@public.gmane.org>

On 09/18/2013 10:44 PM, Adam Borowski wrote:
> 
> In fact, these days it's 8-bit encodings that are more likely to be Unicode
> than 16-bit ones: UTF-8 is ubiquitous, while you usually get UCS2 at most.
> In either case, though, we have here is a 7-bit charset encoded as either
> 8-bit or 16-bit units.  What this function does is blindly truncating upper
> byte.  The supported payload is in both cases ASCII.
> 
> I'd thus rename the function to what it already does: truncating u16 to u8,
> and adjust comments accordingly.
> 
> Replacing values above 126 with a token character like '?' would be good
> too: that'd avoid producing corrupted characters and/or random ASCII chars.
> 
> Your commit only moves things around, so it might be out of scope for now,
> but I wonder: what if the kernel actually supported Unicode here?  Few
> cmdline arguments take values where non-ASCII makes sense, but at least some
> do: for example, a Russian guy is not unlikely to name subvolumes using
> cyrillic.  Supporting that would be easy (estimating the length then
> utf16s_to_utf8s()).  There's just one problem: which encoding to use, but
> these days, most distributions have either dropped non-UTF8 or hardly pay
> lip service, so we could get away with hard-coding UTF-8: those few who
> use ancient charsets can stick to ASCII.  Would this be ok?  If so, shout,
> I can code this if you don't care enough.
> 

We should, indeed, do proper conversion to UTF-8 here.

I also suspect we should assume the input is UTF-16 rather than UCS-2,
although that is a bit more exotic.

	-hpa

WARNING: multiple messages have this Message-ID (diff)
From: "H. Peter Anvin" <hpa@zytor.com>
To: Adam Borowski <kilobyte@angband.pl>
Cc: Roy Franz <roy.franz@linaro.org>,
	linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org,
	matt.fleming@intel.com, leif.lindholm@linaro.org,
	msalter@redhat.com
Subject: Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.
Date: Wed, 18 Sep 2013 22:46:39 -0500	[thread overview]
Message-ID: <523A739F.9050103@zytor.com> (raw)
In-Reply-To: <20130919034406.GA26385@angband.pl>

On 09/18/2013 10:44 PM, Adam Borowski wrote:
> 
> In fact, these days it's 8-bit encodings that are more likely to be Unicode
> than 16-bit ones: UTF-8 is ubiquitous, while you usually get UCS2 at most.
> In either case, though, we have here is a 7-bit charset encoded as either
> 8-bit or 16-bit units.  What this function does is blindly truncating upper
> byte.  The supported payload is in both cases ASCII.
> 
> I'd thus rename the function to what it already does: truncating u16 to u8,
> and adjust comments accordingly.
> 
> Replacing values above 126 with a token character like '?' would be good
> too: that'd avoid producing corrupted characters and/or random ASCII chars.
> 
> Your commit only moves things around, so it might be out of scope for now,
> but I wonder: what if the kernel actually supported Unicode here?  Few
> cmdline arguments take values where non-ASCII makes sense, but at least some
> do: for example, a Russian guy is not unlikely to name subvolumes using
> cyrillic.  Supporting that would be easy (estimating the length then
> utf16s_to_utf8s()).  There's just one problem: which encoding to use, but
> these days, most distributions have either dropped non-UTF8 or hardly pay
> lip service, so we could get away with hard-coding UTF-8: those few who
> use ancient charsets can stick to ASCII.  Would this be ok?  If so, shout,
> I can code this if you don't care enough.
> 

We should, indeed, do proper conversion to UTF-8 here.

I also suspect we should assume the input is UTF-16 rather than UCS-2,
although that is a bit more exotic.

	-hpa


  parent reply	other threads:[~2013-09-19  3:46 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-17  4:11 [PATCH V4 00/17] ARM EFI stub common code Roy Franz
2013-09-17  4:11 ` Roy Franz
     [not found] ` <1379391093-27948-1-git-send-email-roy.franz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2013-09-17  4:11   ` [PATCH 01/17] EFI stub documentation updates Roy Franz
2013-09-17  4:11     ` Roy Franz
2013-09-17  4:11   ` [PATCH 07/17] Move relocate_kernel() to shared file Roy Franz
2013-09-17  4:11     ` Roy Franz
2013-09-17  4:11   ` [PATCH 13/17] Allow efi_free() to be called with size of 0, and do nothing in that case Roy Franz
2013-09-17  4:11     ` Roy Franz
2013-09-17  4:11   ` [PATCH 16/17] Fix types in EFI calls to match EFI function definitions Roy Franz
2013-09-17  4:11     ` Roy Franz
2013-09-17  4:11   ` [PATCH 17/17] resolve warnings found on ARM compile Roy Franz
2013-09-17  4:11     ` Roy Franz
2013-09-18 13:21   ` [PATCH V4 00/17] ARM EFI stub common code Matt Fleming
2013-09-18 13:21     ` Matt Fleming
2013-09-20 14:32   ` H. Peter Anvin
2013-09-20 14:32     ` H. Peter Anvin
2013-09-17  4:11 ` [PATCH 02/17] Add proper definitions for some EFI function pointers Roy Franz
2013-09-17  4:11 ` [PATCH 03/17] Move common EFI stub code from x86 arch code to common location Roy Franz
2013-09-17  4:11 ` [PATCH 04/17] Add system table pointer argument to shared functions Roy Franz
2013-09-17  4:11 ` [PATCH 05/17] Rename memory allocation/free functions Roy Franz
2013-09-17  4:11 ` [PATCH 06/17] Enforce minimum alignment of 1 page on allocations Roy Franz
2013-09-17  4:11 ` [PATCH 08/17] Generalize relocate_kernel() for use by other architectures Roy Franz
     [not found]   ` <1379391093-27948-9-git-send-email-roy.franz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2013-09-18 12:12     ` Matt Fleming
2013-09-18 12:12       ` Matt Fleming
     [not found]       ` <20130918121240.GJ3409-HNK1S37rvNbeXh+fF434Mdi2O/JbrIOy@public.gmane.org>
2013-09-18 16:31         ` Roy Franz
2013-09-18 16:31           ` Roy Franz
2013-09-17  4:11 ` [PATCH 09/17] Move unicode to ASCII conversion to shared function Roy Franz
     [not found]   ` <1379391093-27948-10-git-send-email-roy.franz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2013-09-19  3:44     ` Adam Borowski
2013-09-19  3:44       ` Adam Borowski
     [not found]       ` <20130919034406.GA26385-b9QjgO8OEXPVItvQsEIGlw@public.gmane.org>
2013-09-19  3:46         ` H. Peter Anvin [this message]
2013-09-19  3:46           ` H. Peter Anvin
2013-09-19  4:48         ` Roy Franz
2013-09-19  4:48           ` Roy Franz
     [not found]           ` <CAFECyb-tTaCEqUHUh23MaJf-P42ZpodFKeNG=kE+vmmi-gyKrw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-19 23:02             ` Adam Borowski
2013-09-19 23:02               ` Adam Borowski
2013-09-20  9:30               ` Matt Fleming
2013-09-20  9:27             ` Matt Fleming
2013-09-20  9:27               ` Matt Fleming
     [not found]               ` <20130920092713.GD4785-HNK1S37rvNbeXh+fF434Mdi2O/JbrIOy@public.gmane.org>
2013-09-20 15:00                 ` H. Peter Anvin
2013-09-20 15:00                   ` H. Peter Anvin
     [not found]                   ` <523C62FC.8010907-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2013-09-21 21:31                     ` Roy Franz
2013-09-21 21:31                       ` Roy Franz
2013-09-17  4:11 ` [PATCH 10/17] Rename __get_map() to efi_get_memory_map() Roy Franz
2013-09-17  4:11 ` [PATCH 11/17] generalize efi_get_memory_map() Roy Franz
2013-09-17  4:11 ` [PATCH 12/17] use efi_get_memory_map() to get final map for x86 Roy Franz
2013-09-17  4:11 ` [PATCH 14/17] Generalize handle_ramdisks() and rename to handle_cmdline_files() Roy Franz
2013-09-17  4:11 ` [PATCH 15/17] Renames in handle_cmdline_files() to complete generalization Roy Franz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=523A739F.9050103@zytor.com \
    --to=hpa-ymnouzjc4hwavxtiumwx3w@public.gmane.org \
    --cc=kilobyte-b9QjgO8OEXPVItvQsEIGlw@public.gmane.org \
    --cc=leif.lindholm-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
    --cc=linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=matt.fleming-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=msalter-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=roy.franz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.