From: Adam Borowski <kilobyte-b9QjgO8OEXPVItvQsEIGlw@public.gmane.org>
To: Roy Franz <roy.franz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
matt.fleming-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
leif.lindholm-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
msalter-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Subject: Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.
Date: Thu, 19 Sep 2013 05:44:06 +0200 [thread overview]
Message-ID: <20130919034406.GA26385@angband.pl> (raw)
In-Reply-To: <1379391093-27948-10-git-send-email-roy.franz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
On Mon, Sep 16, 2013 at 09:11:25PM -0700, Roy Franz wrote:
> +/* Convert the unicode UEFI command line to ASCII to pass to kernel.
> + * Size of memory allocated return in *cmd_line_len.
> + * Returns NULL on error.
> + */
> +static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg,
> + int load_options_size = image->load_options_size / 2; /* ASCII */
> + for (i = 0; i < options_size - 1; i++)
> + *s1++ = *s2++;
I'm afraid both this commit and comments inside are misnamed. What you're
changing here is the encoding rather than character set.
In fact, these days it's 8-bit encodings that are more likely to be Unicode
than 16-bit ones: UTF-8 is ubiquitous, while you usually get UCS2 at most.
In either case, though, we have here is a 7-bit charset encoded as either
8-bit or 16-bit units. What this function does is blindly truncating upper
byte. The supported payload is in both cases ASCII.
I'd thus rename the function to what it already does: truncating u16 to u8,
and adjust comments accordingly.
Replacing values above 126 with a token character like '?' would be good
too: that'd avoid producing corrupted characters and/or random ASCII chars.
Your commit only moves things around, so it might be out of scope for now,
but I wonder: what if the kernel actually supported Unicode here? Few
cmdline arguments take values where non-ASCII makes sense, but at least some
do: for example, a Russian guy is not unlikely to name subvolumes using
cyrillic. Supporting that would be easy (estimating the length then
utf16s_to_utf8s()). There's just one problem: which encoding to use, but
these days, most distributions have either dropped non-UTF8 or hardly pay
lip service, so we could get away with hard-coding UTF-8: those few who
use ancient charsets can stick to ASCII. Would this be ok? If so, shout,
I can code this if you don't care enough.
--
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ
WARNING: multiple messages have this Message-ID (diff)
From: Adam Borowski <kilobyte@angband.pl>
To: Roy Franz <roy.franz@linaro.org>
Cc: linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org,
matt.fleming@intel.com, leif.lindholm@linaro.org,
msalter@redhat.com
Subject: Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.
Date: Thu, 19 Sep 2013 05:44:06 +0200 [thread overview]
Message-ID: <20130919034406.GA26385@angband.pl> (raw)
In-Reply-To: <1379391093-27948-10-git-send-email-roy.franz@linaro.org>
On Mon, Sep 16, 2013 at 09:11:25PM -0700, Roy Franz wrote:
> +/* Convert the unicode UEFI command line to ASCII to pass to kernel.
> + * Size of memory allocated return in *cmd_line_len.
> + * Returns NULL on error.
> + */
> +static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg,
> + int load_options_size = image->load_options_size / 2; /* ASCII */
> + for (i = 0; i < options_size - 1; i++)
> + *s1++ = *s2++;
I'm afraid both this commit and comments inside are misnamed. What you're
changing here is the encoding rather than character set.
In fact, these days it's 8-bit encodings that are more likely to be Unicode
than 16-bit ones: UTF-8 is ubiquitous, while you usually get UCS2 at most.
In either case, though, we have here is a 7-bit charset encoded as either
8-bit or 16-bit units. What this function does is blindly truncating upper
byte. The supported payload is in both cases ASCII.
I'd thus rename the function to what it already does: truncating u16 to u8,
and adjust comments accordingly.
Replacing values above 126 with a token character like '?' would be good
too: that'd avoid producing corrupted characters and/or random ASCII chars.
Your commit only moves things around, so it might be out of scope for now,
but I wonder: what if the kernel actually supported Unicode here? Few
cmdline arguments take values where non-ASCII makes sense, but at least some
do: for example, a Russian guy is not unlikely to name subvolumes using
cyrillic. Supporting that would be easy (estimating the length then
utf16s_to_utf8s()). There's just one problem: which encoding to use, but
these days, most distributions have either dropped non-UTF8 or hardly pay
lip service, so we could get away with hard-coding UTF-8: those few who
use ancient charsets can stick to ASCII. Would this be ok? If so, shout,
I can code this if you don't care enough.
--
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ
next prev parent reply other threads:[~2013-09-19 3:44 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-17 4:11 [PATCH V4 00/17] ARM EFI stub common code Roy Franz
2013-09-17 4:11 ` Roy Franz
[not found] ` <1379391093-27948-1-git-send-email-roy.franz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2013-09-17 4:11 ` [PATCH 01/17] EFI stub documentation updates Roy Franz
2013-09-17 4:11 ` Roy Franz
2013-09-17 4:11 ` [PATCH 07/17] Move relocate_kernel() to shared file Roy Franz
2013-09-17 4:11 ` Roy Franz
2013-09-17 4:11 ` [PATCH 13/17] Allow efi_free() to be called with size of 0, and do nothing in that case Roy Franz
2013-09-17 4:11 ` Roy Franz
2013-09-17 4:11 ` [PATCH 16/17] Fix types in EFI calls to match EFI function definitions Roy Franz
2013-09-17 4:11 ` Roy Franz
2013-09-17 4:11 ` [PATCH 17/17] resolve warnings found on ARM compile Roy Franz
2013-09-17 4:11 ` Roy Franz
2013-09-18 13:21 ` [PATCH V4 00/17] ARM EFI stub common code Matt Fleming
2013-09-18 13:21 ` Matt Fleming
2013-09-20 14:32 ` H. Peter Anvin
2013-09-20 14:32 ` H. Peter Anvin
2013-09-17 4:11 ` [PATCH 02/17] Add proper definitions for some EFI function pointers Roy Franz
2013-09-17 4:11 ` [PATCH 03/17] Move common EFI stub code from x86 arch code to common location Roy Franz
2013-09-17 4:11 ` [PATCH 04/17] Add system table pointer argument to shared functions Roy Franz
2013-09-17 4:11 ` [PATCH 05/17] Rename memory allocation/free functions Roy Franz
2013-09-17 4:11 ` [PATCH 06/17] Enforce minimum alignment of 1 page on allocations Roy Franz
2013-09-17 4:11 ` [PATCH 08/17] Generalize relocate_kernel() for use by other architectures Roy Franz
[not found] ` <1379391093-27948-9-git-send-email-roy.franz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2013-09-18 12:12 ` Matt Fleming
2013-09-18 12:12 ` Matt Fleming
[not found] ` <20130918121240.GJ3409-HNK1S37rvNbeXh+fF434Mdi2O/JbrIOy@public.gmane.org>
2013-09-18 16:31 ` Roy Franz
2013-09-18 16:31 ` Roy Franz
2013-09-17 4:11 ` [PATCH 09/17] Move unicode to ASCII conversion to shared function Roy Franz
[not found] ` <1379391093-27948-10-git-send-email-roy.franz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2013-09-19 3:44 ` Adam Borowski [this message]
2013-09-19 3:44 ` Adam Borowski
[not found] ` <20130919034406.GA26385-b9QjgO8OEXPVItvQsEIGlw@public.gmane.org>
2013-09-19 3:46 ` H. Peter Anvin
2013-09-19 3:46 ` H. Peter Anvin
2013-09-19 4:48 ` Roy Franz
2013-09-19 4:48 ` Roy Franz
[not found] ` <CAFECyb-tTaCEqUHUh23MaJf-P42ZpodFKeNG=kE+vmmi-gyKrw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-19 23:02 ` Adam Borowski
2013-09-19 23:02 ` Adam Borowski
2013-09-20 9:30 ` Matt Fleming
2013-09-20 9:27 ` Matt Fleming
2013-09-20 9:27 ` Matt Fleming
[not found] ` <20130920092713.GD4785-HNK1S37rvNbeXh+fF434Mdi2O/JbrIOy@public.gmane.org>
2013-09-20 15:00 ` H. Peter Anvin
2013-09-20 15:00 ` H. Peter Anvin
[not found] ` <523C62FC.8010907-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2013-09-21 21:31 ` Roy Franz
2013-09-21 21:31 ` Roy Franz
2013-09-17 4:11 ` [PATCH 10/17] Rename __get_map() to efi_get_memory_map() Roy Franz
2013-09-17 4:11 ` [PATCH 11/17] generalize efi_get_memory_map() Roy Franz
2013-09-17 4:11 ` [PATCH 12/17] use efi_get_memory_map() to get final map for x86 Roy Franz
2013-09-17 4:11 ` [PATCH 14/17] Generalize handle_ramdisks() and rename to handle_cmdline_files() Roy Franz
2013-09-17 4:11 ` [PATCH 15/17] Renames in handle_cmdline_files() to complete generalization Roy Franz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130919034406.GA26385@angband.pl \
--to=kilobyte-b9qjgo8oexpvitvqseiglw@public.gmane.org \
--cc=leif.lindholm-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
--cc=linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=matt.fleming-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=msalter-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=roy.franz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.