* [PATCH] VHD: Fix locale aware character encoding handling
@ 2015-03-08 10:54 Philipp Hahn
2015-03-11 12:30 ` Ian Campbell
0 siblings, 1 reply; 3+ messages in thread
From: Philipp Hahn @ 2015-03-08 10:54 UTC (permalink / raw)
To: xen-devel; +Cc: Philipp Hahn
ASCII is 7 bit only, which does not work in UTF-8 environments:
> failed to read parent name
Setup locale in vhd-util to parse LC_CTYPE and use the right codeset
when doing file name encoding and decoding.
Increase allocation for UTF-8 buffer as one UTF-16 character might use
twice as much space in UTF-8 (or more).
Don't check outbytesleft==0 as one UTF-8 characters get encoded into
1..8 bytes, so it's perfectly fine (and expected) for the output to have
remaining bytes left.
Test-case:
$ ./vhd-util create -n ä.vhd -s 1
$ ./vhd-util snapshot -n snap.vhd -p ä.vhd ; echo $?
See
<http://unix.stackexchange.com/questions/48689/effect-of-lang-on-terminal>
for more information about the details of handling the encoding right.
Signed-off-by: Philipp Hahn <hahn@univention.de>
---
tools/blktap2/vhd/lib/libvhd.c | 27 +++++++++++++++++++--------
tools/blktap2/vhd/vhd-util.c | 3 +++
2 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/tools/blktap2/vhd/lib/libvhd.c b/tools/blktap2/vhd/lib/libvhd.c
index 95eb5d6..1fd5b4e 100644
--- a/tools/blktap2/vhd/lib/libvhd.c
+++ b/tools/blktap2/vhd/lib/libvhd.c
@@ -37,6 +37,7 @@
#include <iconv.h>
#include <sys/mman.h>
#include <sys/stat.h>
+#include <langinfo.h>
#include "libvhd.h"
#include "relative-path.h"
@@ -1296,6 +1297,7 @@ vhd_macx_encode_location(char *name, char **out, int *outlen)
size_t ibl, obl;
char *uri, *uri_utf8, *uri_utf8p, *ret;
const char *urip;
+ char *codeset;
err = 0;
ret = NULL;
@@ -1304,7 +1306,7 @@ vhd_macx_encode_location(char *name, char **out, int *outlen)
len = strlen(name) + strlen("file://");
ibl = len;
- obl = len;
+ obl = len * 2;
urip = uri = malloc(ibl + 1);
uri_utf8 = uri_utf8p = malloc(obl);
@@ -1312,7 +1314,8 @@ vhd_macx_encode_location(char *name, char **out, int *outlen)
if (!uri || !uri_utf8)
return -ENOMEM;
- cd = iconv_open("UTF-8", "ASCII");
+ codeset = nl_langinfo(CODESET);
+ cd = iconv_open("UTF-8", codeset);
if (cd == (iconv_t)-1) {
err = -errno;
goto out;
@@ -1325,7 +1328,7 @@ vhd_macx_encode_location(char *name, char **out, int *outlen)
(char **)
#endif
&urip, &ibl, &uri_utf8p, &obl) == (size_t)-1 ||
- ibl || obl) {
+ ibl) {
err = (errno ? -errno : -EIO);
goto out;
}
@@ -1357,6 +1360,7 @@ vhd_w2u_encode_location(char *name, char **out, int *outlen)
size_t ibl, obl;
char *uri, *uri_utf16, *uri_utf16p, *tmp, *ret;
const char *urip;
+ char *codeset;
err = 0;
ret = NULL;
@@ -1404,7 +1408,8 @@ vhd_w2u_encode_location(char *name, char **out, int *outlen)
* MICROSOFT_COMPAT
* little endian unicode here
*/
- cd = iconv_open("UTF-16LE", "ASCII");
+ codeset = nl_langinfo(CODESET);
+ cd = iconv_open("UTF-16LE", codeset);
if (cd == (iconv_t)-1) {
err = -errno;
goto out;
@@ -1415,7 +1420,7 @@ vhd_w2u_encode_location(char *name, char **out, int *outlen)
(char **)
#endif
&urip, &ibl, &uri_utf16p, &obl) == (size_t)-1 ||
- ibl || obl) {
+ ibl) {
err = (errno ? -errno : -EIO);
goto out;
}
@@ -1447,11 +1452,13 @@ vhd_macx_decode_location(const char *in, char *out, int len)
iconv_t cd;
char *name;
size_t ibl, obl;
+ char *codeset;
name = out;
ibl = obl = len;
- cd = iconv_open("ASCII", "UTF-8");
+ codeset = nl_langinfo(CODESET);
+ cd = iconv_open(codeset, "UTF-8");
if (cd == (iconv_t)-1)
return NULL;
@@ -1479,11 +1486,13 @@ vhd_w2u_decode_location(const char *in, char *out, int len, char *utf_type)
iconv_t cd;
char *name, *tmp;
size_t ibl, obl;
+ char *codeset;
tmp = name = out;
ibl = obl = len;
- cd = iconv_open("ASCII", utf_type);
+ codeset = nl_langinfo(CODESET);
+ cd = iconv_open(codeset, utf_type);
if (cd == (iconv_t)-1)
return NULL;
@@ -2450,6 +2459,7 @@ vhd_initialize_header_parent_name(vhd_context_t *ctx, const char *parent_path)
size_t ibl, obl;
char *ppath, *dst;
const char *pname;
+ char *codeset;
err = 0;
pname = NULL;
@@ -2459,7 +2469,8 @@ vhd_initialize_header_parent_name(vhd_context_t *ctx, const char *parent_path)
* MICROSOFT_COMPAT
* big endian unicode here
*/
- cd = iconv_open(UTF_16BE, "ASCII");
+ codeset = nl_langinfo(CODESET);
+ cd = iconv_open(UTF_16BE, codeset);
if (cd == (iconv_t)-1) {
err = -errno;
goto out;
diff --git a/tools/blktap2/vhd/vhd-util.c b/tools/blktap2/vhd/vhd-util.c
index 944a59e..13f1835 100644
--- a/tools/blktap2/vhd/vhd-util.c
+++ b/tools/blktap2/vhd/vhd-util.c
@@ -28,6 +28,8 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
+#include <langinfo.h>
+#include <locale.h>
#include "libvhd.h"
#include "vhd-util.h"
@@ -114,6 +116,7 @@ main(int argc, char *argv[])
if (setrlimit(RLIMIT_CORE, &rlim) < 0)
fprintf(stderr, "setrlimit failed: %d\n", errno);
#endif
+ setlocale(LC_CTYPE, "");
ret = 0;
--
1.9.1
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] VHD: Fix locale aware character encoding handling
2015-03-08 10:54 [PATCH] VHD: Fix locale aware character encoding handling Philipp Hahn
@ 2015-03-11 12:30 ` Ian Campbell
2015-03-12 16:05 ` Philipp Hahn
0 siblings, 1 reply; 3+ messages in thread
From: Ian Campbell @ 2015-03-11 12:30 UTC (permalink / raw)
To: Philipp Hahn; +Cc: xen-devel
On Sun, 2015-03-08 at 11:54 +0100, Philipp Hahn wrote:
> ASCII is 7 bit only, which does not work in UTF-8 environments:
> > failed to read parent name
>
> Setup locale in vhd-util to parse LC_CTYPE and use the right codeset
> when doing file name encoding and decoding.
>
> Increase allocation for UTF-8 buffer as one UTF-16 character might use
> twice as much space in UTF-8 (or more).
>
> Don't check outbytesleft==0 as one UTF-8 characters get encoded into
> 1..8 bytes, so it's perfectly fine (and expected) for the output to have
> remaining bytes left.
>
> Test-case:
> $ ./vhd-util create -n ä.vhd -s 1
> $ ./vhd-util snapshot -n snap.vhd -p ä.vhd ; echo $?
>
> See
> <http://unix.stackexchange.com/questions/48689/effect-of-lang-on-terminal>
> for more information about the details of handling the encoding right.
>
> Signed-off-by: Philipp Hahn <hahn@univention.de>
I'm a bit perplexed over why libvhd is even trying to interpret these
bytes, I probably don't want to know...
Anyway: acked + applied, thanks.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] VHD: Fix locale aware character encoding handling
2015-03-11 12:30 ` Ian Campbell
@ 2015-03-12 16:05 ` Philipp Hahn
0 siblings, 0 replies; 3+ messages in thread
From: Philipp Hahn @ 2015-03-12 16:05 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel
Hello Ian,
On 11.03.2015 13:30, Ian Campbell wrote:
> On Sun, 2015-03-08 at 11:54 +0100, Philipp Hahn wrote:
>> ASCII is 7 bit only, which does not work in UTF-8 environments:
>>> failed to read parent name
...
>> Don't check outbytesleft==0 as one UTF-8 characters get encoded into
>> 1..8 bytes, so it's perfectly fine (and expected) for the output to have
>> remaining bytes left.
...
> I'm a bit perplexed over why libvhd is even trying to interpret these
> bytes, I probably don't want to know...
If with bytes you mean the encoding used for the file-name: When
creating a snapshot the names are stored UTF-16 encoded for Windows and
in UTF-8 for MacOS-X compatibility. Therefore the utility needs to know
from which encoding to start.
If with bytes you mean the (input|output)-bytes left: yeah, gory UTF-8
details.
> Anyway: acked + applied, thanks.
Thanks. I hope it builds on BSD or wherever vhd-utils are also used.
Philipp
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-03-12 16:05 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-08 10:54 [PATCH] VHD: Fix locale aware character encoding handling Philipp Hahn
2015-03-11 12:30 ` Ian Campbell
2015-03-12 16:05 ` Philipp Hahn
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.