linux-c-programming.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrej Gelenberg <andrej.gelenberg@udo.edu>
To: Krzysztof <kj@limes.com.pl>
Cc: linux-c-programming@vger.kernel.org
Subject: Re: Unicode or not?
Date: Mon, 05 Mar 2012 21:52:45 +0100	[thread overview]
Message-ID: <4F55279D.6040702@udo.edu> (raw)
In-Reply-To: <jj373v$aek$1@dough.gmane.org>

Hi,

no, you can't simply cast it to wchar. I recommend to read this article
about unicode under linux:
http://www.ibm.com/developerworks/linux/library/l-linuni/index.html

There are 2 possible ways to deal with utf8: keep it char* and use as
simple c-string. Pro: it's simple and you can keep using standard str*
functions and it often smaller as wchar string. Cons: some non latin
symbols may consume more then one bytes, so strlen will report bigger
number as characters there, which can lead to problems with displaying
or counting the characters. You can steel do it with mblen, but it's bit
pain.
Second option is to convert it to wchar with mbstowcs() function. Pro:
characters are always fixed bit-width. Cons: you need to convert between
utf8 and wchar and you need additional buffer to hold wchar string (you
can't do in in-place, because wchar string will be often bigger then
utf8 string).

For example, if you need or just wont wchar string, you can do something
like this:

int l = strlen(argv[i]);
wchar_t *nbuf = calloc(sizeof(*nbuf), l);
if ( !nbuf ) return 1;
l = mbstowcs(nbuf, argv[i], l); // mbstowcs may return smaller value as
                                // l
if ( l == -1 ) {
  /* invalid multibyte sequence was encountered */
  free(nbuf);
  return 2;
}

Regards,
Andrej Gelenberg

On 03/05/2012 09:19 PM, Krzysztof wrote:
> So how to read effectively UTF-8 characters from char* passed as an
> argument under Linux?
> Should one simply cast argv[n] to wchar_t*?
> 


  reply	other threads:[~2012-03-05 20:52 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-05 12:23 Unicode or not? Krzysztof
2012-03-05 14:04 ` Andrej Gelenberg
2012-03-05 20:19   ` Krzysztof
2012-03-05 20:52     ` Andrej Gelenberg [this message]
2012-03-05 19:57 ` Glynn Clements

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F55279D.6040702@udo.edu \
    --to=andrej.gelenberg@udo.edu \
    --cc=kj@limes.com.pl \
    --cc=linux-c-programming@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).